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Overarching themes 


© Overarching themes 


The following three overarching themes have been fully integrated throughout the Pearson Edexcel 
AS and A level Mathematics series, so they can be applied alongside your learning and practice. 

1. Mathematical argument, language and proof 

e Rigorous and consistent approach throughout 

e Notation boxes explain key mathematical language and symbols 

¢ Dedicated sections on mathematical proof explain key principles and strategies 

¢ Opportunities to critique arguments and justify methods 


2. Mathematical problem solving The Mathematical Problem-solving cycle 
¢ Hundreds of problem-solving questions, fully integrated specify the problem 
into the main exercises 
e Problem-solving boxes provide tips and strategies interpret results 
. . collect information 
e Structured and unstructured questions to build confidence 
e Challenge boxes provide extra stretch process and | | 
represent information 


3. Mathematical modelling 

¢ Dedicated modelling sections in relevant topics provide plenty of practice where you need it 

e Examples and exercises include qualitative questions that allow you to interpret answers in the 
context of the model 

¢ Dedicated chapter in Statistics & Mechanics Year 1/AS explains the principles of modelling in 
mechanics 


Finding your way around the book Access an online 
digital edition using 
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Discrete random 1 front of the book. 


variables 
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b . Fiad the expected value of 4 discrete condom variable pens 

Each chapter starts with msmunnnsnin ar a 
i j j 9 Pd the variance of a discrete random variabie 

a list of objectives + ratacopnced win ned acts bnclea kX PunTaD 


©. Solve problems involing random variables 


ted 


+ pages LAK 


The real world applications 
of the maths you are about 
to learn are highlighted at 
the start of the chapter with 
links to relevant questions in 
the chapter 


The Prior knowledge check 
helps make sure you are 
ready to start the chapter 


A level content is 


clearly flagged ~ —=-—_. 
Exercises are-———__ 


packed with exam- 
style questions 

to ensure you 

are ready for the 
exams 


probability of ap single experiment working is 0.1 
20th attempt 


© Pied the peobotility thar therrict 


firs 25. 

® 9 The random variable has negative 
beomial distribution, Negative 5, 0.7) 
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# PUN tO 
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amie iy 
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@ Suite a suitatte distribution to model 2. 
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Challenge boxes 

give you a chance to 
tackle some more bia rp 
difficult questions 


® Geto Y= Negative BCS. 0.41 show thie PLY < # ean be written 4 

1 =F. £2) where 1, p and ware valves te be fount. 

© Gem tht Y > 
Norms of F, 6x) Yor suatabie w and, 


Each section begins 


; : |@ Mean and variance of the negative binomial distribution 
with explanation "4 etna retest sane ante ee 


variance of a randce variable with the negative binomist distribution 
© IY Negative B(y, 9), then: 
+ Mean of Xm Ei) aya t 
P a tth=a 
+ Mariamee Of X= Var(t) = a? = > 


and key learning 
points 

Each chapter ends 
with a Mixed exercise 
and a Summary of 
key points 


Fone and Juan play noughts and cromes, The 
A represeats the number of games that they 
| Find FLV) ad Vart x's 


B 


Step-by-step 
worked examples 
focus on the key 
types of questions 
you'll need to 
tackle 


Exam-style questions 
are flagged with (£) 


Problem-solving 
questions are flagged 


with @) 
Every few chapters a Review exercise 


helps you consolidate your learning 
with lots of exam-style questions 


Review exercise 


1 


b the vatecof p amd the vale of g (2) 


© 1 We medven variate 1 has probability 
function 


RYee vet. 4456 «Varin ” 
Var (d= 24 Q@ 
© Comrruct a table giving the peabability ¢ Wert J daadiiins 
distribution of X. @ eo 
Find: ©) 4 The rasdown variate has Probability 
b NIKE S, a distribution gives by 


© the exact value of BU). 
4 Show that Vari) 1.97 00 thee 
Mpniticasn fipeces 


© Given that Bx1) = 44, write down two: 
© Find Wo - 3X) equations involving p and y. o 


Serta 12.53 


Find: 
» the valeo of pamd the valueolg (2) 
@® 2 The randoen variate has probability eae a 
Phorm 6 Mac Ker ( 
by rriet® Given that EEA") 27.4, find: 
NYene= 2, 
Ae ews 4 Vener) @ 
whore & ts a constant € B09 4x) 0 
Find the value of & @ € Vant19 = 4) ay 
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# Son Oe eins apie, Reus, @©® § Te discrete random variable X has 
Vou) LAT. ro) 
probability distribution given by 
4 Find, 10 o0€ decimal place, 7 ltoereys 
Vortt = 34, a) a | 4 
Sections 11, 62,19 Lites [esi esi o } 2 
The randoen variahte ¥is detined as 
@® 3} The cansicen variate has probability Yo2~ 3%. Gives thar £7) = 2.9, 
diinbuboe given by @ find the values of a and fh o> 
lx Try ays] calculate EX and VarLX) Oo 
(nvm for] £192] ¢ [os € write down the valucof Vat) (1) 
# Given that F(X} © 3.5, rite downs two 4 flow PY+ 1 <9), 7) 
equations imolving p and g. o Section 14 
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8 Marvict is conducting sciemiic experienents Each experiment is 
4 Fd the prsbabslity that Harict achieves her socal succesful experiment on her 


bind the pecibobiity that st tales her mone than AM) experiments to pet four suecesses, (2 marks) 
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oe 


Part bo astong you tn Ped the pretutaicy that 
‘SACS OCIS Wher On the SED Wiel OF the Ge ria. 
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a 
© Given that the peobability of hee hitting the bullseye on each aivempe is 0.35, find the 


A it takes ber at keant eight throws to ssoee the chroe bulhseyes 
MF it takes her mine throws, given that she hits the bullseye on her firs throw, 
© “Give one reswon why this mote! may not accurately represent the situation 


Negative Bi, fh. fed a generat expression for PLY «yin, 


beobsbility that ona wins is 0,7, The camdous variable 
toed to play for Loar 0 wie seven times 


Overarching themes 


Geoametric and negative bincmis! datrdenions 


independent and the BD Mocs change games and stan playing chess The random variable T represents the emnber of 
ames thar they need 10 play for Juan to win twice. Give that the mean of Ys 6, find 
© Juan's probability of winning any singh: pare of chess 
(2 marks) € the mandaed deviation of ¥ 


2 marks) 


1 omar) | 


| Exercise ©) 


The random variate 2'bas negative bimomist distribution with p< 04 amt} =X Find 
amy ® Voruy) 


2 The random varishte Y'has negative bisomial Setribution with p = 0.25 a ' 
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Problem-solving boxes 
provide hints, tips and 
strategies, and Watch out 
boxes highlight areas where 
students often lose marks in 
their exams 


Exercise questions 
are carefully graded 
so they increase 

in difficulty and 
gradually bring you 
up to exam standard 


Exam-style practice 


Further Mathematics 
AS Level 


Further Statistics 1 


Time: $0 minutes 
You must have: Mathematical Formulae and Statistical Tables, Calculator 


1 The discrete random variable J has probatsl 


The random varuibie Y= 24+ 5 
4 find the vabees of wand & 
calculate the exact vabwr of Var(2) 
€ find KY -2 > yy 


Given that EY) = 4.45, 

Co) 
o 
2 
2 A call centre recctves calls shout surance at a rae of 3.2 


Riley bills at 2 rate of 4.1 per ten-einate interval. Calle a 
bill are iaxtependont of cach other. 


2 Ina tons al, 
sonra later | calculate the prrotabbity that the compuary recives exactly 3calls of 


» inane ~ ‘i 
toad nS teal, calculate the probabiny cas tthe Commatty Moecives at least 7 calls in 


a 
« mai Coe-hour period, cukealine the peobability that the COMPANY NoCKIVeS Fewer that 45 cally in 
2 


that they play. A random 
below 


Per teovminute interval and calls about 
bout insurance amd calls abovur weility 


3 A sports clad collects data 
‘sampte Of 250 menaders 


‘on the peader of its members and the spent 
‘tale and the data is recorded in the table 


AS and A level practice papers 
at the back of the book help you 
prepare for the real thing. 


Extra online content 


& Extra online content 


Whenever you see an Online box, it means that there is extra online content available to support you. 


SolutionBank 


SolutionBank provides a full worked solution for 
every question in the book. 


' Online ) Full worked solutions are oe 


available in SolutionBank. 


Download all the solutions as a PDF or 
quickly find the solution you need online 


Use of technology ey 
Explore topics in more detail, visualise ' Online ) Find the point of intersection 

problems and consolidate your understanding graphically using technology. 

using pre-made GeoGebra activities. 


GeaGebra 


GeoGebra-powered interactives 


Interact with the maths you are learning 
using GeoGebra's easy-to-use tools 


Access all the extra online content for free at: 


www.pearsonschools.co.uk/fsimaths 


You can also access the extra online content by scanning this QR code: 


vi 


After completing this chapter you should be able to: 


e@ Find the expected value of a discrete random variable Y -> pages 2-5 


@ Find the expected value of X¢ 


e Find the variance of a discrete random variable 


=> pages 3-5 


— pages 5-7 


e Use the expected value and variance of a function of ¥ -> pages 7-11 


e Solve problems involving random variables 


) Discrete random variables are an 

| important tool in probability. Banks 
and stockmarket traders use random 

_ variables to model their risks on 
investments that have an element 

of randomness. By calculating the 
expected value of their profits, they can 
be confident of making money in the 
long term. 


ATA | 


— pages 11-14 


1 The random variable X¥ ~ B(12, 2). Find: 


0. wd) 
b P(XY S 2) 
CPG < ess Wi) 


Statistics and Mechanics 
< Year 1, Section 6.3 


The discrete random variable Y has 
probability mass function P(Y = y) = ky%, 
y=1,2,4,5, 10. 
a Find the value of k. 
b Find P(Y is prime). 


Statistics and Mechanics 
< Year 1, Section 6.1 
Solve simultaneously: 
3x+2y-—z=5 
2x-y=8 


x-z=3 © Pure Year 1, Chapter 3 


Chapter 1 


(1.1) Expected value of a discrete random variable 


Recall that a random variable is a variable whose ‘he Guesebilites eh ampdicerae 

value depends ona random event. The random fandom vensbleadkl waite, reradieaee 
variable is discrete if it can only take certain random variable, X, you write )P(X = x) = 1 
numerical values. € Statistics and Mechanics Year 1, Chapter 6 


If you take a set of observations from a discrete random t Watch out | The expected valuetea 


variable, you can find the mean of those observations. theoretical quantity, and gives information 

As the number of observations increases, this value will about the probability distribution of a 

get closer and closer to the expected value of the random variable. 

discrete random variable. 

= The expected value of the discrete Notation ] The expected value is 
random variable X is denoted E(X) sometimes referred to as the mean, and is 
and defined as E(X) = )_xP(X = x). denoted by p. 


Example 


A fair six-sided dice is rolled. The number on the uppermost face is modelled by the random 
variable X. 

a Write down the probability distribution of X. 

b Use the probability distribution of X to calculate E(X). 


a |x V2 Se | AE Se iG 


1 1 1 1 1 1 
Pe=x)|e|lealelelel <z 


b The expected value of X is: 


E(X) = oxP(X¥=x) =E+54+..4+¢ 


If you know the probability distribution of XY then you can calculate the expected value. Notice that in 
Example 1 the expected value is 3.5, but P(Y = 3.5) = 0. The expected value of a random variable does 
not have to be a value that the random variable can actually take. Instead this tells us that in the long 
run, we would expect the average of all rolls to get close to 3.5. 


The random variable X has a probability distribution 
as shown in the table. 


a Given that E(Y) = 3, write down two equations 
involving p and q. 
b Find the value of p and the value of g. 


2 


apt+qt+O14+03+02=1 
ptq=1-O06 
p+q=O04 (1) 
(1x O1) + 2p+(3x0.3)4+4¢+(5xO0.2)=3 


Discrete random variables 


Problem-solving 


Remember that the probabilities must add up 
to 1. You will often have to use )-P(X = x) = 1 
when solving problems involving discrete 
random variables. 


2p+4q=3-(014+09 +1) 
2p+4q=1 (2) 
b 2p+4q=1 — 
2p t+ 2q=08 
Ss0 2q9=0.2 
g= O03 
p=O4-q- 
= OA - Ol 
=03 


If X is a discrete random variable, then X@ is also a discrete random variable. You can use this rule to 


determine the expected value of X%. 


m E(X2) = )\x?P(X = x) 


Any function of a random variable is 
also a random variable. — Section 1.3 


A discrete random variable X has the following probability distribution 


x 1 5 3 4 
12 6 4 3 
P(X=x) | 35 | 25 | 35 | 35 


a Write down the probability distribution for X?. 
b Find E(Y?). 


a The probability distribution for X is 


% + )]2)31]4 

a 7, | 4 19 | 1% )— 4 
2tela4atl|.3 

PAP ==") | oe | a5 | 25 | Bs 


b E(X2) = )ox2P(X2 = x2) 


qed 6 4 3 
=Ix5et4xpet+IOxget+ 16x se 
_ 120 
“25 
= 4.6 


t Watch out ) E(X*) is, in general, not equal to 


(E(X))*. In this example E(X) = 1.92 and 
1922448. 


WwW 


Chapter 1 


Exercise 1A) 


1 


For each of the following probability distributions write out the distribution of X? and calculate 
both E(X) and E(x”). 


a|x 2 4 6 8 
P(X = x) 0.3 0.3 0.2 0.2 


b ix ~2 i 1 4 t Watch out Note that, for example, 


P(X? = 4) = P(X = 2) + P(X = -2). 
P(x=x) | 01 | 04 | 01 | 04 


The score on a biased dice is modelled by a random variable X with probability distribution 


x 1 2 3 4 5 6 
P(X = x) 0.1 0.1 0.1 0.2 0.4 0.1 
Find E(X) and E(x”). 


The random variable X has a probability function 
PXY=x)=t x=2,3,6 


a Construct tables giving the probability distributions of X and X°. 
b Work out E(X) and E(X?”). 
c State whether or not (E(X))? = E(X?). 


The random variable X has a probability function given by 
22% x= 1,2, 3,4 
PUK =a)= 15 v=5 
a Construct a table giving the probability distribution of X. 
b Calculate E(X) and E(x”). 


c State whether or not (E(X))? = E(X?). 


The random variable X has the following probability distribution: 


es 1 2 3 4 5 

P(Xx=x) | 0.1 a b 0.2 | 01 
Given that E(X) = 2.9, find the value of a and the value of 5. (5 marks) 
The random variable X has the following probability Problem-solving 
distribution: 

You can use the given information to write 
x -2 = 1 2 down simultaneous equations for a, b and 
P(X = x) 0.1 a b c c which can be solved using the matrix 
; inverse operation on your calculator. 
- 2) — 
Given that ECY) = 0.3 and E(X’) = 1.9, find a, b and c. < Gore Pure baci Secucte6 
(7 marks) 


Discrete random variables 


7 The discrete random variable X has probability function 
a(1 - x) x =-2,-1,0 


P(Y=x)= i — 


Given that E(Y) = 1.2, find the value of a and the value of 5. (6 marks) 


8 A biased six-sided dice has a < chance of landing on any of the numbers 1, 2, 3 or 4. 


The probabilities of landing on 5 or 6 are unknown. The outcome is modelled as a random 
variable, XY. Given that ECY) = 4.1, 


a find the probability distribution of X. (5 marks) 
The dice is rolled 10 times. 
b Calculate the probability that the dice lands on 6 at least 3 times. (3 marks) 


© 9 Jorge has designed a game for his school fete. Students can pay £1 to roll a fair six-sided dice. 


If they score a 6 they win a prize of £5. If they score a 4 or a 5, they win a smaller prize of £P. 


By modelling the amount paid out in prize money as a cD ay ane : 
discrete random variable, determine the maximum value Prexpecten prone mom the 


: : is th f ing th 
of P in order for Jorge to not make a loss on his game. ae grea es 
minus the expected value of the 


(5 marks) amount paid out in prize money. 


Challenge 


Three fair six-sided dice are rolled. The discrete random variable _X is 
defined as the largest value of the three values shown. Find E(X). 


@ Variance of a discrete random variable 


If you take a set of observations from a discrete random variable, Notation } 
you can find the variance of those observations. As the number 


The variance is 
sometimes denoted by o7, where 


of observations increases, this value will get closer and closer to cis the standard deviation. 

the variance of the discrete random variable. 

= The variance of X is usually written as Var(X) Online ) =a eer ey 
anciia defined as var) EE) mart ee of eee an 

The random variable (X — E(X))? is the squared deviation variable and compare the theoretical 

from the expected value of X. It is large when X takes distribution with obseerved results 

values that are very different to E(X). generated from that discrete random 


: i . : : variable using GeoGebra. 
= Sometimes it is easier to calculate the variance using 


the formula Var(X) = E(X2) — (E(X))? 


From the definition you can see that Var(X) = 0 for any random variable X. The larger Var(X) the 
more variable YX is. In other words, the more likely it is to take values very different to its expected 
value. 


Chapter 1 


Example 


A fair six-sided dice is rolled. The number on the uppermost face is modelled by the random 


variable X. 
Find Var(X). 


Method 1 
We have that E(X) = 3.5. 


Var(X) = So(x — E(X))? PX = x) 


=6.25x2+225x2+025xé- 


The distributions of X, X* and (X - E(X))? are gi 
x 1 2 3 4 5 6 
ae 1 4 2 1G 25 | 36 
(x = ECX))* | 6.25: |.2.25. | O25 | O725: | 2:25 | 6.25 
x=) [Ze lzlelél? 
So the variance is 


= (625+ 2.25+0.25)xt=3 


Method 2 
The expected value of X° is 


E(X2) = Sx*P(X¥an =t14+44..4+36)=2 


So using the alternative formula 


Var(X) = E(X?) - (EX)? =O - Fae 


4 12 
Exercise 1B) 


1 The random variable X has a probability distribution given by 


x -1 0 
P(X = x) 


ale}| dN 
ios) 


1 
a 
5 


ale 


ale 
ale 


a Find E(X). 
b Find Var(X). 


2 Find the expected value and variance of the random variable XY with probability distributions 


given by: 
a|x 1 2 3 
1 1 1 
P(X = x) : 2 ra 
c |x —2 -1 1 2 
1 1 1 1 
P(X = x) El 3 6 6 


3 Given that Y is the score when a single, unbiased, eight-sided dice is rolled, find E(Y) and Var(Y). 


6 


P(X =x) 


Ale 


Rl 


Ble] 


Discrete random variables 


Two fair, cubical dice are rolled and S is the sum of their scores. Find: ip THeetandard 
a the distribution of S b E(S) deviation of a random 


c Var(S) d_ the standard deviation, o variable is the square 
root of its variance. 


Two fair, tetrahedral (four-sided) dice are rolled and D is the difference between their scores. 
Find: 

a the distribution of D 

b E(D) 

c Var(D) 


A fair coin is spun repeatedly until a head appears or three spins have been made. The random 
variable T represents the number of spins of the coin. 


a Show that the probability distribution of T is 


t 1 2 3 

Bites: |, la a (3 marks) 
b Find the expected value and variance of 7. (6 marks) 
The random variable X has a probability distribution given by 

x 1 2 3 

P(X = x) a b a 
where a and + are constants. 
a Write down an expression for E(XY) in terms of a and b. (2 marks) 
b Given that Var(X) = 0.75, find the values of a and b. (5 marks) 


C&) Expected value and variance of a function of X 


If X is a discrete random variable, and g is a function, then g(X) is also a discrete random variable. 
You can calculate the expected value of g(X) using the formula: 


E(g(X)) = >_g(x)P(X = x) 


This is a more general version of the formula for E(X?). For simple functions, such as addition and 
multiplication by a constant, you can learn the following rules: 


If X is a random variable and a and J are constants, then E(aX + 5) = aE(X) +b 
If X and Y are random variables, then E(X + Y) = E(X) + E(Y) 


You can use a similar rule to simplify variance calculations for some functions of random variables: 


If X is a random variable and a and J are constants then Var (aX + 5) = a?Var(X) 


= 3D. | 


Example 


A discrete random variable XY has the probability distribution 


x 


P(X¥=x) | 2 


la} to 
la] Ww 
bales a 


a Write down the probability distribution for Y where Y=2X + 1. 
b Find E(Y). 
c Compute E(X) and verify that E(Y) = 2E(X) + 1. 


a The probability distribution for Y is 


x 1 = | 3 | 4 
y 3/5 17]9 
ie) €] 4 | 3 


PFS) | oe || oe | Be | as 

b E(Y) = )_yP(¥ = y) 
- ais 6 acme a 
=3X5e+59Xo5+7X55+9X Ge 


ee 
25 
= 4.64 
hes S 
c E(X) = DoxP(X¥ = x) = 1 X 55 +2 * Ge 
4 3 46 
a oe eR Ge He Oe 


Therefore 2E(Y)+1=2x1.92+1=484-—\ 
Therefore E(Y) = 2E(X) + 1 


Example 


A random variable X has E(X) = 4 and Var (X) = 3. Find: 


a E(3X) b E(X- 2) 
ec Var(3X) d Var(X — 2) 
e E(X?’) 


a E(3X) = 3E(X¥)=3x4=12 

b EX -2)=FY)-2=4-2=2 

é Vanax) = 37 Vary) =ox3=27 

d Var (X - 2) = Var(X) = 3 

e E(X?) = Var(X) + (E(X))? = 3 + 4% = 19 


Example 


Discrete random variables 


Two fair 10p coins are spun. The random variable X pence represents the total value of the coins 


that land heads up. 

a Find E(X) and Var(X). 

The random variables S and T are defined as follows: 
S=X-10and T=43X-5 

b Show that E(S) = E(7). 

ce Find Var(S) and Var (T). 


A large number of observations of S and T are taken. 


d Comment on any likely differences or similarities. 


a The probability distribution of X is 


x O 10 ZO 
] 1 1 
P(X = x) Za a Za 


E(X) = 10 by inspection 

Var(X) = E(X?) - (E(X))? 
~o2y tl Bigs a. 2y 1 _ 422 

Var(X) = O x | +10 xB + 20 x] 102=.50 

b E(S) = E(X¥ - 10) = E(Y) -10 = 10-10 =O 


1 


E(T) = e(5x-5) = Se) -5 = 010+ 820 


2 a 


d The means of both set of observations should be 
close to zero. The observed values of S will be 
more spread out than the observed values of T. 


Var(T) = (4) Var(X) = 22 = 125 


The random variable X has the following probability distribution: 


x 0° | 30° | 60° | 90° 
P(X¥=x) | 04 | 02 | 01 | 03 


Calculate E(sinX’). 


Xe} 


Chapter 1 


The distribution of sin X is 


= 
2 | 3s | 
P(X = x) 0.4 0.2 O.1 0.3 


sinx O 


E(sin. X) = Sosinx P(X = x) 


_84+V3 
~ 20 


Exercise 1C) 


1 The random variable X has a probability distribution given by 


~ 0.467 


x 1 a. 2 fa 
P(Xx=x) | 0.1 | 03 | 02 | 04 


a Write down the probability distribution for Y where Y= 2X —- 3. 
b Find E(Y). 
c Calculate E(Y) and verify that E(2X — 3) = 2E(X) - 3. 


2 The random variable X has a probability distribution given by 


x -2 -1 0 1 2 
P(X¥=x) | 0.1 | OL | 02 | 04 | 0.2 


a Write down the probability distribution for Y where Y= X°. 
b Calculate E(Y). 


3 The random variable X¥ has E(Y) = 1 and Var(X) = 2. Find: 
a E(8X) b E(X¥ + 3) ce Var(X + 3) 
d Var(3X) e Var(1 - 2X) f E(x?) 


4 The random variable X¥ has E(Y) = 3 and E(X?) = 10. Find: 
a E(2X) b E3-4Y) c E(X?-4X) 
d Var(X) e Var(3X + 2) 


5 The random variable X has a mean yw and standard deviation o. 
Find, in terms of wand oc: 
a E(4X) b E(2X +2) c E(2X -2) 
d Var(2X + 2) e Var(2X — 2) 


10 
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Discrete random variables 


In a board game, players roll a fair, six-sided dice each time they make it around the board. 
The score on the dice is modelled as a discrete random variable X. 


a Write down E(X). 


They are paid £200 plus £100 times the score on the dice. The amount paid to each player is 
modelled as a discrete random variable Y. 

b Write Yin terms of X. 

c Find the expected pay-out each time a player makes it around the board. 


John runs a pizza parlour that sells pizza in three sizes: small (20 cm diameter), medium (30cm 
diameter) and large (40cm diameter). Each pizza base is | cm thick. John has worked out that 


on average, customers order a small, medium or large pizza with probabilities i sr and 5 


respectively. Calculate the expected amount of pizza dough needed per customer. 


Two tetrahedral dice are rolled. The random variable X represents the result of subtracting the 
smaller score from the larger. 


a Find E(XY) and Var(X). (7 marks) 
The random variables Y and Z are defined as Y = 2% and Z = at 

b Show that E(Y) = E(Z). (3 marks) 
c Find Var(Z). (2 marks) 


Challenge Hint ) You can assume that 


Show that E((X — E(X))?) = E(X%) — (E(X))?. E(X + ¥) = E(x) +4 E(Y), 


1.4) Solving problems involving random variables 


Suppose you have two random variables ¥ and Y = g(X). If g is one-to-one, and you know the mean 
and variance of Y, then it is possible to deduce the mean and variance of X. 


Example 


X — 150 


X is a discrete random variable. The discrete random variable Y is defined as Y = 50 
Given that E(Y) = 5.1 and Var(Y) = 2.5, find: 


a 
b 


E(X) 
Var(X). 
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The discrete random variable X has a probability distribution given by 


X=50Y +150 - 
E(X) = E(SOY + 150) 
= 50E(Y) + 150 
= 255-4 150 
= 405 
b Var(X) = Var(5OY + 150) - 
= 50*Var(Y) 
250° 25 
=6250 


! 


Example 


x ao) i 0 1 2 
P(X = x) 0.3 a 0.25 b c 
The discrete random variable Y is defined as Y= 3X - 1. 
Given that E(Y) = -2.5 and Var(Y) = 13.95, find: 

a E(XY) and E(x’) 

b the values of a, b and c 

c P(XY> Y) 


E(X) = e(+¢ 


- +1)=-0 


Yr 


Wane var (1 ie = 155 


=5 cece aa 
| NarlaxsB)=arvertX) 

So E(X?) = Var(X) + (E(X))? = 1.55 + 0.25 = 1.8. 

E(X¥)=-2 x0.3-1xa+O0x0254+1xb+2xc 

Rearranging 

-atb+2c=0O1 (2) 

E(X2)=4x0341xat+0x0254+1xb+4xec 

Rearranging 

a+b+4c=O06 (3) 


Discrete random variables 


Exercise 1D) 


1 


Writing equations (1), (2) and (3) as a single matrix 
equation: 


| % hfe O.45 
-1 1 2) bJ={ oOo. 
1 1 A/\Xe O.G 


So, by inverting the matrix we find - 


a ; 2 = 2. . WV fOAS 0:2 
(>) =e 6 3 =3)/ OF | =1- 02 
6 =2 0 Z 0.6 O18)s) 


So @= 0:2; 6 = O2 and ¢ = 0:05. 


c P(X> Y)=P(X>3X-1)=P(X<} - 
So P(X > Y)=034+02+4+0.25 =075 


X is a discrete random variable. The random variable Y is defined by Y = 4X — 6. Given that 
E(Y) = 2 and Var(Y) = 32, find: 

a E(X) 

b Var(X) 

c the standard deviation of X. 


X is a discrete random variable. The random variable Y is defined by Y = tat 


Given that E(Y) = -1 and Var(Y) = 9, find: 
a E(X) 

b Var(X) 

ce E(x’) 


The discrete random variable X has a probability distribution given by 


x i oi 3.5) a 
P(X=x) | 03 | a | b&b | 02 


The random variable Y is defined by Y = 2X + 3. Given that E(Y) = 8, find the values of a and b. 


The discrete random variable X has a probability distribution given by 


x 90° | 180° | 270° 
P(X=x) | a b | 03 


The random variable Y is defined as Y = sin X°. 
a Find the range of possible values of E(Y). (5 marks) 
b Given that E(Y) = 0.2, write down the values of a and b. (2 marks) 


@) 5 


Chapter 1 


The discrete random variable XY has a probability distribution given by 


x —2 -1 0 1 2 
P(X = x) a b c b 
The random variable Y is defined Y= (X + 1). 


a Given that E(Y) = 2.4 and P(Y > 2) = 0.4, show that: 
2a+2b+c=1 
10a+ 4b+c=2.4 
a+ b =0.4 
b Hence find the values of a, b, and c. 
ec Find P(2X¥+3 Y). 


The discrete random variable X has a probability distribution given by 


a x=1,2,3 
P(Y=x)= 45 x=4,5 
c x=6 


Suppose that Y is defined by Y= 1 - 2X. 


a Given that E(Y) = —5.6 and P(Y < —5) = 0.6, write down the value of E(X). 


b Show that: 
3a+2b+ c=l 
2a+3b+2c=1.1 
a+2b+ c=0.6 
c Solve the system to find values for a, b, c. 


d Find P(¥>5+4 Y). 


Mixed exercise @ 


1 


1 The random variable X has the probability function 
PX=x=57 x=1,2,3,4,5, 6 


a Construct a table giving the probability distribution of X. 
Find: 


4 


(1 mark) 


(4 marks) 
(2 marks) 
(2 marks) 


b P(2< XS) c E(X) d Var(X) 
e Var(3 - 2X) f E(x) 
2 The discrete random variable X has the probability distribution given in the table below. 
x —2 -1 0 1 2 3 
P(X = x) 0.1 0.2 | 0.3 r 0.1 0.1 
Find: 
ar b P(-1 < X <2) c E(2X + 3) d Var(2X + 3) 


Discrete random variables 


3 A discrete random variable X has the probability distribution shown in the table below. 


x 0 1 
P(X =x) + b | t+b 
a Find the value of b. b Show that E(Y) = 1.3. 
c Find the exact value of Var(X). d Find the exact value of PLY = 1.5). 


The discrete random variable X has a probability function 


k(1 - x) x=0,1 
P(Y=x)= 4 k(x - 1) x=2,3 
0 otherwise 


where k is a constant. 


a Show that k = + (2 marks) 
b Find E(Y) and show that EX’) = 5.5. (4 marks) 
ce Find Var(2X - 2). (4 marks) 


A discrete random variable X has the probability distribution 


x 0 1 2 3 

PX=x) | a | 3 | s | g 

Find: 

a P(l<X¥ 2) b E(X) ec E3X-1) 
d Var(X) e E(log(X + 1)) 


A discrete random variable Y has the probability distribution 


x 1 2 3 4 

P(v¥=x) | 04 | 02 | 01 | 03 

Find: 

a P(3< X?< 10) b E(XY) c Var(X) 
a (35%) e EW) f EQ™) 


A discrete random variable is such that each of its values is assumed to be equally likely. 

a Write the name of the distribution. 

b Give an example of such a distribution. 

A random variable X has discrete uniform distribution and can take values 0, 1, 2, 3 and 4. 
Find: 

c E(X) d Var(X) 
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The random variable X has the probability distribution 


x 1 2 3 4 5 
P(X = x) 0.1 7) qd 0.3 0.1 


a Given that E(Y) = 3.1, write down two equations involving p and gq. 
Find: 

b the value of p and the value of q ce Var(X) 

d Var(2X — 3) 


The random variable X has the probability function 


pursay= | tt x=1,2 

k(x-2) x=3,4,5 

where k is a constant. 

a Find the value of k. (2 marks) 
b Find the exact value of E(X). (1 mark) 
c Show that, to three significant figures, Var(Y) = 2.02. (2 marks) 
d Find, to one decimal place, Var(3 - 2X). (1 mark) 


The random variable X has the discrete uniform distribution 
P(XY=x)=% x= 1,2,3,4,5, 6 


a Write down E(X) and show that Var(X) = = (4 marks) 
b Find E(2X - 1). (2 marks) 
ce Find Var(3 - 2X). (2 marks) 
d Find E(2%). (3 marks) 
The random variable X has the probability function 
—~,,_3*-1 _ 

P(Y= x)= % x=1,2,3,4 
a Construct a table giving the probability distribution of X. 
Find: 
b P(2< X¥ <4) c the exact value of E(Y). 


d Show that Var(Y) = 0.92 to two significant figures. e Find Var(1 - 3X). 


The random variable Y has mean 2 and variance 9. 

Find: 

a E(3Y+1) b E2-3Y) ec Var(3Y +1) 

d Var(2 -3Y) e E(Y’) f E(Y-1)(¥+1)) 
The random variable T has a mean of 20 and a standard deviation of 5. 


The random variable S is defined as S = 37+ 4. 
Find E(S) and Var(S). 


14 A fair spinner is made from the disc in the diagram and the 


15 
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Discrete random variables 


random variable X represents the number it lands on after 
being spun. 

a Write down the distribution of X. b Work out E(X). 
ce Find Var(X). d Find E(2X + 1). 
e Find Var(3X - 1). 


The discrete variable X has the probability distribution 


x -1 0 1 2 

P(X = x) 0.2 0.5 0.2 0.1 

Find: 

a E(X) b Var(X) c EGX+1) d Var($X + 1) 


The discrete random variable X has a probability distribution given by 


x -1 0 1 2 
P(X = x) 0.1 0.3 a b 


The random variable Y is defined Y = 3.XY — 1. Given that E(Y) = 1.1, 


a find the values of a and b. (5 marks) 
b Calculate E(X?) and Var(X) using the values of a and b that you found in part a. (3 marks) 
c Write down the value of Var(Y). (1 mark) 
d Find P(Y+2> 4X). (2 marks) 


The discrete random variable X has a probability distribution given by 


x -2 0 2 3 4 
P(X = x) a b a b é 


The random variable Y is defined as Y = sat 


You are given that E(Y) = —0.98 and P(Y = -1) = 0.4. 


a Write down three simultaneous equations in a, b and c. (4 marks) 
b Solve this system to find the values of a, b and c. (3 marks) 
c Find P(-2X> 10Y). (2 marks) 
Challenge Hint ) You can make use of the 
Let n be a positive integer and suppose that X is a discrete following results: 
random variable with P(X = 7) =) for i= 1, noon fils ie aie 1) 
7" 
_n+1 ee 1) .» nn+1)(2n +1) 
Show that E(X) = “3— and Var(X) = 4, —— ne eee 


< Core Pure Book 1, Chapter 3 


17 


Chapter 1 


Summary of key points 
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The expected value of the discrete random variable XY is denoted E(X) and defined as 
E(X) = 5)xP(X = x) 


The expected value of X2 is E(X2) = 5)x?P(X = x) 


The variance of X is usually written as Var(X) and is defined as 
Var(X) = E((X — E(X))?) 


Sometimes, it is easier to calculate the variance using the formula 
Var(X) = E(X?) — (E(X))? 


If X is a discrete random variable, and g is a function, then g(X) is also a discrete random 
variable. 


You can calculate the expected value of g(X) using the formula: 
E(g(X)) = > g(x) P(X = x) 


If X is a random variable and a and b are constants, then E(aX + b) = aE(X) + 5b. 
If X and Y are random variables, then E(X + Y) = E(X) + E(Y) 


If X is a random variable and a and b are constants then Var(aX + 5) = a?Var(X) 


After completing this chapter you should be able to: 


e Use the Poisson distribution to model real-world situations 
— pages 20-27 


e Use the additive property of the Poisson distribution - pages 27-29 


e@ Understand and use the mean and variance of the Poisson 
distribution — pages 30-31 


e@ Understand and use the mean and variance of the binomial 
distribution — pages 32-34 


e Use the Poisson distribution as an approximation to the binomial 
distribution — pages 34-38 


~~ 


1 Therandom variable X ~ B(35, 0.4). Find: 
a P(X = 20) b P(X < 6) 


c P5= X¥ < 20) 
€ Statistics and Mechanics Year 1, Chapter 6 


2. A biased dice is modelled by a random 
variable X with the following probability 
distribution. 


x 
P(X = x) 
Find: 
a E(X) b E(x”) 
| c Var(X) € Sections 1.1, 1.2 
wel 120m | CO Ale Tn A Be 


The Poisson distribution is used to 
model the number of times an event 
occurs within a fixed period of time. 


yn y sa RtrCe 
’ Scientists use Poisson distributions to 


model the frequency of meteor strikes. 


Chapter 2 


ED The Poisson distribution 


The exponential function e* can be defined as an infinite series expansion: 


This is the Maclaurin expansion of e*. 
< Core Pure 2, Chapter 2 


This definition can be used to generate a probability 
distribution with parameter A, where A > 0. 


Mo Re 2 uM 
A= 10 ite 
ef = A+ it a taptee typ te 
Dividing both sides by e* gives 
Mer jte4t = Be Net 
— 20e-A 
1=/°e% + 7 y 31 +... 7 +... 


Notice that the sum of the probabilities of the 
infinite series on the right-hand side equals 1 

and so you could use these values as probabilities 
to define a probability distribution. 


The sum of the probabilities in any 
probability distribution must equal 1. 


€ Statistics and Mechanics Year 1, Section 6.1 


If we let XY be a discrete random variable such that _X takes the values 0, 1, 2, 3, ... then the probability 
distribution for X could be: 


xX 0 1 2 3 r 
7 e“)} eAy2 e“)3 etn 
P(X= x) 1! 2! 31 r! 


This distribution is called the Poisson distribution. 


= If X ~ Po(A), then the Poisson distribution 


is given by 


P(X = x) = 


Example ey 


e-4Ax 
x! 


The random variable XY ~ Po( 2.1). Find: 


a P(¥=3) 
b P(X¥=1) 
c P(l< ¥< 4) 


x=0,1,2,3,... 


-21 3 
fess = es 


3! 
= 0.169011... + 


= 0.1890 (4 dp) 
b P(X = 1)=1-P(X=0)s 


aera! 
=1-0.1224.... 


= 0.8775 (4 dp) « 
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t Watch out ) This is an infinite probability 


distribution. P(Y = x) > 0 for any positive 
integer x, although as x gets large, the 
probabilities get very small. 


Notation | You say that_X has a Poisson 


distribution with parameter A. 


Use the formula P(X = x) mica with x =3 and 


A 
| 
R224. ay 


You can work this out using the Poisson 
probability distribution function on your 
calculator. 


X can only take positive integer values. 


Round probabilities to 4 decimal places. 


Poisson distributions 


er <x = 4) Add together all the possible probabilities. 
=P(X= 2) + P(X = 3) + P(X = 4) ————____—_ The positive integers that satisfy the inequality 
ee ee ee x le: ee See are 2,3 and 4. 
a Cae Se 


= 0.2700... + ONGQO. ».. # O109F2 sx. 


= 0.5583 (4 dp.) ' Online ) Explore the Poisson distribution go 
using GeoGebra. 


1 The discrete random variable XY ~ Po(2.5). Find: 
a P(X¥=3) b P(X¥> 1) c PO<XS3) 


2 The discrete random variable XY ~ Po(3.1). Find: 
a P(X¥=4) b P(X¥2 2) c POs x4) 


3 The discrete random variable X ~ Po(4.2). Find: 
a P(X¥=2) b P(Y S3) ec P35 XS) 


4 The discrete random variable XY ~ Po(0.84). Find: 
a P(X¥=1) b P(Y¥2 1) c PO<XS3) 


() 5 The discrete random variable XY ~ Po(A). Given that P(X = 2) = P(X = 3), find 4. 


() 6 The discrete random variable XY ~ Po(A). Given that PLY = 4) = 3 x P(XY = 2), find A. 


Calculations involving the Poisson distribution can often be simplified by using the cumulative 
distribution tables given on page 191. These tables will be given in the Mathematical Formulae and 
Statistical Tables booklet in your exam. These will tell you P(Y < x) for values of J between 0 and 10, 
in steps of 0.5, and for values of x from 0 to 22. 


You can also use the Poisson cumulative distribution function on your calculator to find P(X < x) for 
other values of A and x. 


Example a 


The random variable XY ~ Po(5). Find, using tables: 
a P(Y <3) b P(X¥ = 2) c Piss xX¥<4) 


a P(X S 3) =0.2650 > 


0.0111 0.0067 
0.0611 0.0404 


0.1736 0.1247 


0.3423 [012650 


0.5321 0.4405 
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c PIAS X¥S4)=P¥ = 4) -P(XY SO) 


b PX2 2)=1-PX¥<1)=1-00404 —_| 2 careful. The inequality is = so you need to 


= 0.9596 work out 1 — P(X¥ S 1). 


= 04405 - 0.0067 
= 04336 


The random variable XY ~ Po(7.5). Find the values of a, b and c such that: 
a P(X¥ Sa) =0.2414 b P(X < b)=0.5246 c P(X = c) = 0.3380 


a PX <a) = 0.2414 - Use tables with /= 7.5. 
P(X <5) = 0.2414 
50:a = 5 
P(X < b) = P(X <b-1)=05246- 
s0 b-1=7 | _ Use tables with / = 7.5. 
b=6 P(X S 7) = 0.5246 
PX = c) =1-P(X Sc - 1) = 0.3380 
so P(X Sc-1)=1-03380 th 
=~ 06620 - Use tables with /= 7.5. 
P(X < 8) = 0.6620 
s0 ¢-1=8 
ec2o 


Exercise 2B) 


Use the Poisson cumulative distribution tables on page 191 to answer these questions. 
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The discrete random variable XY ~ Po(5.5). Find: 
a P(X S3) b P(X = 6) ec P3S<X7) 


The discrete random variable XY ~ Po(10). Find: 
a P(XY = 8) b P(7 SX S12) c P(4<X<9) 


The discrete random variable XY ~ Po(3.5). Find: 
a P(X = 2) b PBS XS<6) ec PQ2<XS5) 


The discrete random variable X ~ Po(4.5). Find: 
a P(X¥=5) b P(3< XS) ec Pi =X<7) 


The discrete random variable XY ~ Po(8). Find the values of a, b, c and d such that: 
a P(X¥ <a)=0.3134 b P(X S4))=0.7166 c¢ P(X <c)=0.0996 d P(X > d)=0.8088 


The discrete random variable XY ~ Po(3.5). Find the values of a, b, c and d such that: 
a P(X <a)=0.8576 b P(X >b)=0.6792. ¢ P(¥ Sc) = 0.95 d P(X > d) < 0.005 


Poisson distributions 


2.2) Modelling with the Poisson distribution 


You need to be able to recognise situations that can be modelled with a Poisson distribution. The 
Poisson distribution is used to model the number of times, X, that a particular event occurs within a 
given interval of time or space. 


= In order for the Poisson distribution to be a good model, the events must occur: 
e independently 
e singly, in space or time 


e ata constant average rate so that the mean number in an interval is proportional to the 
length of the interval 


The parameter, J, in the Poisson distribution is the average number of times that the event will occur 
ina single interval. 


Examples of where a Poisson distribution might be appropriate are: 

e the number of radioactive particles being emitted by a certain source in a 5-minute period 
e the number of telephone calls to a switchboard in a 10-minute interval 

e the number of spelling mistakes on a page of a newspaper 

e the number of cars passing the front of a school in a 3-minute interval 

e the number of raisins in a fruit scone. 


Example 4) 


An internet service provider has a large number of users regularly connecting to the internet. 

On average, 4 users every hour fail to connect to the internet at their first attempt. 

a Give two reasons why a Poisson distribution might be a suitable model for the number of failed 
connections every hour. 

b Find the probability that in a randomly chosen hour: 
i 2 users fail to connect at their first attempt 
ii more than 6 users fail to connect at their first attempt. 


c Find the probability that in a randomly chosen 90-minute period: 
i 5 users fail to connect at their first attempt 
ii fewer than 7 users fail to connect at their first attempt. 


a Failed connections occur singly and at a 
constant rate of 4 users per hour. 
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b X =the number of failed connections in —_____ 


Define your random variable, and write down the 
model you are using. 


Use the tables, or your calculator, with 2 = 4 to 


one hour 

X ~ Po(A) 

i PES 2):= O1465 

i: PX = oa Px =O 
=1-—066932... 
= 0.1107 (4 d.p,) 

c Y =the number of failed connections in 
90 minutes 
Y ~ Po( 6) 


i P(Y= 5) = 0.1606 (4 d.p.) 
ii P(Y < 7) =P(Y S 6) 


= 0.6063 (4 d.p.) 


Exercise (20) 


find P(XY = 6). 


Problem-solving 


Because the failures occur at a constant average 


rate the value of the parameter A will be 


a x 4=6 fora 90-minute period. 


1 The maintenance department of a school receives requests for replacement light bulbs at a rate 


of 3 per week. 


The number of requests, X, in a given week is modelled as XY ~ Po(3). 


a Find the probability that, in a randomly chosen week, the number of requests for 


replacement light bulbs is: 
i exactly 4 
ii more than S. 


b Find the probability that, in a randomly 
chosen fortnight, the number of requests 


for replacement light bulbs is: 
i exactly 6 
ii no more than 4. 


Hint ) The number of requests, Y, in a given 
fortnight can be modelled as Y ~ Po(6). 


2 A botanist suggests that the number of weeds growing in a field can be modelled by a Poisson 


distribution. 


a Write down two conditions that must apply for this model to be applicable. 


Assuming this model and that weeds occur at a rate of 1.3 per m’, find: 


b the probability that, in a randomly 
chosen plot of size 4 m7, there will be 
fewer than 3 weeds 


c the probability that, in a randomly 
chosen plot of 5 m2, there will be more 
than 8 weeds. 
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The number of weeds, X, in a plot of 4m* can be 
modelled as X ~ Po(4 x 1.3), ie X ~ Po(5.2). 
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Poisson distributions 


An electronics company manufactures a component for use in computer hardware. At the 
end of the manufacturing process, each component is checked to see if it is faulty. Faulty 
components are detected at a rate of 2.5 per hour. 


a Suggest a suitable model for the number of faulty components detected per hour. 


b Describe, in the context of this question, two assumptions you have made in part a for this 
model to be suitable. 


c Find the probability of 2 faulty components being detected in a 1-hour period. 
d Find the probability of at least 6 faulty components being detected in a 3-hour period. 
e Find the probability of at least 7 faulty components being detected in a 4-hour period. 


A call-centre agent handles telephone calls at a rate of 15 per hour. 


a Find the probability that, in any randomly selected 20-minute interval, the agent handles: 
i exactly 4 calls ii more than 8 calls. 


b Find the probability that, in a randomly selected 30-minute interval, the agent handles: 
i atleast 6 calls ii no more than 10 calls. 


The average number of cars crossing over a bridge is 180 per hour. Assuming a Poisson 
distribution, find the probability that: 


a more than 5 cars will cross in any given minute 


b fewer than 4 cars will cross in any 2-minute period. 


A café serves breakfast every morning. Customers arrive for breakfast at random at an average 
rate of 1 every 4 minutes. 
Find the probability that on a Friday morning between 10 am and 10:20 am: 


a fewer than 3 customers arrive for breakfast 
b more than 10 customers arrive for breakfast. 


An estate agent has been selling houses at a rate of 1.8 per week. 
a Find the probability that in a particular week she sells: 


i no houses ii 3 houses iii at least 3 houses. (6 marks) 
The estate agent meets her weekly target if 
: Problem-sol 
she sells at least 3 houses in one week. iictidli dal 2 
Use a binomial model for part b. 


Be Bing the prong tab over a pened OF € Statistics and Mechanics Year 1, Chapter 6 


4 consecutive weeks she meets her weekly 
target exactly once. (3 marks) 


Patients arrive at a hospital accident and emergency department at random at a rate of 5 per 
hour. 


a Find the probability that, during any 30-minute period, the number of patients arriving at 
the hospital accident and emergency department is: 
i exactly 4 ii at least 3. (5 marks) 


A patient arrives at 11:00 am. 
b Find the probability that the next patient arrives before 11:15 am. (3 marks) 


25 


@) 9 


@® w 


@®u 


@ nv 


Ep) 13 


26 


Chapter 2 


The lift in a block of flats breaks down at random at a mean rate of three times per four-week 
period. 
a Find the probability that the lift breaks down: 

i at least once in one week 

ii exactly twice in one week. (5 marks) 


In one particular week, the lift broke down twice. 


b Write down the probability that the lift will break down at some point in the next week. 
Give a reason for your answer. (2 marks) 


Flaws occur at random in a particular type of material at a mean rate of 1.5 per 50m. 


a Find the probability that, in a randomly chosen 50m length of this material, there will 
be exactly 3 flaws. (2 marks) 


This material is sold in rolls of length 200m. 

b Find the probability that a single roll has fewer than 4 flaws. (3 marks) 
Priya buys 5 rolls of this material. 

c Find the probability that at least two of these rolls will have fewer than 4 flaws. (5 marks) 


A company produces chocolate chip biscuits. The number of chocolate chips per biscuit has a 
Poisson distribution with mean 5. 


a Find the probability that one of these biscuits, selected at random, contains fewer than 
3 chocolate chips. (2 marks) 


A packet contains 6 of these biscuits, selected at random. 


b Find the probability that exactly half of the biscuits in the packet contain fewer than 
3 chocolate chips. (4 marks) 


A company has minibuses that can only be hired for a week at a time. All hiring starts on a 
Sunday. During the summer, the mean number of requests for minibuses each Sunday is 5. 


a Calculate the probability that fewer than 4 requests for minibuses are made on a particular 
Sunday in summer. (2 marks) 


The company wants to be at least 99% sure they can fulfil all requests on any particular 
Sunday. 


b Calculate the number of minibuses the company must have in order to satisfy this 
condition. (3 marks) 


On a typical summer’s day, a boat company hires out rowing boats at a rate of 9 per hour. 


a Find the probability of hiring out at least 6 boats in a randomly selected 30-minute 
period. (2 marks) 


The company has 8 boats and decides to hire them out for 20-minute periods. 
b Show that the probability of running out of boats is less than 1%. (3 marks) 


c Find the number of boats that the company should have in order to be 99% sure of 
meeting all demands if the hire period is extended to 30 minutes. (3 marks) 


Poisson distributions 


14 Breakdowns on a particular machine occur at a rate of 1.5 per week. 


a Find the probability that no more than 2 breakdowns occur in a randomly chosen week. (2 marks) 
b Find the probability of at least 5 breakdowns in a randomly chosen two-week period. (3 marks) 


A maintenance firm offers a contract for repairing breakdowns over a six-week period. 
The firm will give a full refund if there are more than n breakdowns in a six-week period. 
The firm wants the probability of having to pay a refund to be 5% or less. 


c Find the smallest possible value of n. (3 marks) 
(a) Adding Poisson distributions 


If two Poisson variables XY and Y are Watch out | : ae 
independent, then the variable Z = X + Y also pore a {e Sonesta ero 
context, the random variables X and Y must 


fig ei Olss OTIS TIDE, both model events occurring within the same 
= If X~ Po(/) and Y ~ Po(), then interval of time or space. 


X+ Y~Po(, +p) 


Example @ 


If X ~ Po(3.6) and Y ~ Po(4.4), find: 


a P(X¥+ Y=7) b P(X¥+ Y <5) 
X+ Y ~ Po(3.6 + 44) + Add the parameters. 
X + Y ~ Po (8) 
ee x 87 


a P(X+Y=7)= = 0.1396 (4 d.p.)-——— Use the tables, or your calculator, with 4 = 8. 


7! 
b P(X + YS 5) = 01912 (4 dp.) 


Example 


The number of cars passing an observation point in a 5-minute interval is modelled by a Poisson 
distribution with mean 2. The number of other vehicles passing the observation point in a 
15-minute interval is modelled by a Poisson distribution with mean 3. 


Find the probability that: 
a exactly 5 vehicles, of any type, pass the observation point in a 10-minute interval 
b more than 8 vehicles, of any type, pass the observation point in a 15-minute interval. 


a X, = number of cars passing in a 10-minute 


interval Problem-solving 


Y, = number of other vehicles passing in a You need to model the number of cars passing 
10-minute interval in a 10-minute interval, and the number of other 
X, ~ Po(A), Y, ~ Po(2) vehicles passing in a 10-minute interval. The time 
X, + Y, ~ Po(4 + 2) intervals must be the same before you can add 
¥,4 Y, = PeG) the parameters. 


P(X, + Y, = 5) = 01606 (4 dp) 
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b X> = number of cars passing in a 15-minute TL Define new random variables for the number 


of cars, and other types of vehicle, passing in a 
Y; = number of other vehicles passing in a 15-minute interval. 


interval 


15-minute interval 

X> ~ Po(6), Yo ~ Po(3) 
X> + Yo ~ Po(6 + 3) 
Xo + Yo ~ Po(d) 


Pao+ BS A=1=FPX,s OSS 
aa aeeee This can be calculated using the tables, or your 


= 0.5443 (4 dp) calculator, with 2 = 9. 


Exercise (2D) 


1 
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X and Y are independent random variables such that X ~ Po(3.3) and Y ~ Po(2.7). Find: 


a P(X¥+ Y=5) b P(X + YS7) c P(X + Y>4) 

A and B are independent random variables such that A ~ Po(3.25) and B ~ Po(4.25). Find: 
a P(4+ B=7) b P(A+BS5) c P(A+B>9) 

X and Y are independent random variables such that X ~ Po(2.5) and Y ~ Po(3.5). Find: 

a P(¥ =2 and Y= 2) b P(both X and Y are greater than 2) 

c P(X + Y=5) d P(X¥+Ys4) 


The number of emissions per minute from two different sources of radioactivity are modelled 
as independent Poisson random variables X¥ and Y, with parameters of 3 and 5 respectively. 
Calculate the probability that, in a given one-minute period, 


a the number of emissions from each source is at least 3 


b the total number of emissions from the two sources is no more than 6. 


During a weekday at a certain point of a road, cars pass by at a rate of 24 per minute, 
while lorries pass by at a rate of 8 per minute. 
a Find the probability that, in any 15-second period, 

i at least 4 of each type of vehicle passes by 

ii the total number of cars and lorries that pass by is no more than 9. 


b Write down one assumption that you have made in your calculations. 


A taxi company supplies two particular organisations independently. 
Company A orders taxis at a rate of 1.25 cars per day. 
Company B orders taxis at a rate of 0.75 cars per day. 
a Ona given day, find the probability that 2 cars are ordered by company A. (2 marks) 
b Ona given day, find the probability that the total number of cars ordered by both 
companies is 2. (2 marks) 
c Ina given 5-day working week, find the probability that the total number of cars ordered by 
both companies is less than 10. (2 marks) 
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A restaurant has two coffee machines, C and D. Machine C breaks down at a rate of 0.1 
times per week while, independently, machine D breaks down at a rate of 0.05 times per week. 
Find the probability that, in a 12-week period, 


a machine C breaks down at least once (2 marks) 
b each machine breaks down at least once (3 marks) 
c there will be a total of 3 breakdowns. (2 marks) 


A secretary receives internal calls at a rate of 1 every 5 minutes and external calls at a rate 
of 2 every 5 minutes. 
Calculate the probability that the total number of calls is: 


a 3in a 4-minute period (2 marks) 
b at least 2 in a 2-minute period (2 marks) 
c no more than 5 in a 10-minute period. (2 marks) 


An Office is situated on 3 floors of a building. On each floor it has a photocopier. The ground- 
floor photocopier breaks down at a rate of 0.4 times per week, the first-floor photocopier 
breaks down at a rate of 0.2 times per week and the second-floor photocopier breaks down at a 
rate of 0.8 times per week. Find the probability, in a given week, that: 


a each photocopier will break down exactly once (3 marks) 
b atleast one photocopier breaks down (3 marks) 
c there will be a total of 2 breakdowns. (2 marks) 


During the working day the emails arriving to the account of a company director are classified 
into three types: personal, business and advertising. Personal emails arrive at a mean rate of 
1.8 per hour, business emails arrive at a mean rate of 3.7 per hour and advertising emails arrive 
at a mean rate of 1.5 per hour. Find the probability that she receives: 


a atleast one of each type of email during a 30-minute period of the working day (3 marks) 
b more than 50 emails in an 8-hour working day. (3 marks) 


c Find the probability that she receives Hint ) 
Use a binomial model for part c. 


more than 50 emails on exactly 
. € Statistics and Mechanics Year 1, Chapter 6 
two days out of a 5-day working 


week. 
(3 marks) 


Challenge 


X ~ Po(d) and Y ~ Po(u). The random variable O = X + Y. 
a) Prove that P(O—0) Ee") 
b Prove that P(O = 1) = (A+ p)e"4*) 
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(ee Mean and variance of a Poisson distribution 


It can be shown that if the random variable, X, has a Poisson distribution with parameter, A, then the 
mean and variance of X, are both equal to A. 
= If X ~ Po(A) 
e Mean of X=E(X) =A 
e Variance of X = Var(X) = 0? =) 
The fact that the mean is equal to the variance is an important property of a Poisson distribution. 


The presence or absence of this property can be a useful indicator of whether or not a Poisson 
distribution is a suitable model for a particular situation. 


Example 


A botanist counts the number of daisies, x, in each of 80 randomly selected squares within a field. 
The results are summarised below. 


yx = 295, hx* = 1386 
a Calculate the mean and the variance of the number of daisies per square for the 80 squares. 
Give your answers to 2 decimal places. 
b Explain how the answers from part a support the choice of a Poisson distribution as a model. 


c Using a suitable value for A, estimate the probability that exactly 3 daisies will be found in a 
randomly selected square. 


_, xe 295 
a Mean=x= oa” ae 3.69 (2 dp) 
-2 
Variance = 0? = a -— x? 
_ 1386 _ eal 
~ 80 80 
= 3.73 (2 dp) 


b Both the mean and the variance are 3.7 
correct to one decimal place. The fact that 
the mean is close to the variance supports 
the choice of a Poisson distribution as a 
model. 

Use 4 = 3.7, which is the mean and variance from 


ce UsingA’ = 37 - seal 


X = the number of daisies per square 
X ~ Po(3.7) 
P(X = 3) = 0.2087 (4 d.p) - 


This can be calculated using the tables, or your 
calculator, with A = 3.7. 
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Exercise 2) 


1 


A student is investigating the numbers of cherries in a fruit scone. A random sample of 
100 fruit scones is taken and the results can be summarised as: 


Dx = 143, Ux? = 347 
a Calculate the mean and the variance of the data. 


b Explain why the results in part a suggest that a Poisson distribution may be a suitable model 
for the number of cherries in a fruit scone. 


c Using a suitable value for A, estimate the probability that exactly 3 cherries will be found in a 
randomly selected fruit scone. 


The number of cars passing a checkpoint during 200 periods of 5 minutes is recorded. 


Number of cars 0 1 2 3 4 5 6 7 8 >9 
Frequency 7 21 30 41 36 29 21 11 4 0 


a Calculate the mean and the variance of the data. 

b Explain why the results in part a suggest that a Poisson distribution may be a suitable model 
for the number of cars passing the checkpoint in a 5-minute period. 

c Using a suitable value for A, estimate the probability that no more than 2 cars will pass the 
checkpoint in a given 5-minute period. 

d Compare your answer to part ec with the relative frequency of obtaining no more than 2 cars 
from the sample. 


Tests for flaws are carried out in a textile factory on a consignment of 120 pieces of cloth. 
The results of the tests are shown in the table. 


Number of flaws 0 1 2 3 4 5 6 7 =>8 
Number of pieces 8 19 28 25 19 11 7 3 0 
a Calculate the mean and the variance of the data. (4 marks) 


b Explain why the results in part a suggest that a Poisson distribution may be a suitable model 
for the number of flaws on a piece of cloth. (1 mark) 
The factory produces 10000 pieces of cloth each week, and wants to estimate the number that 
will have 8 or more flaws. 
c Explain why an estimate based on the observed relative frequencies would not be useful. 
(1 mark) 


d Use a Poisson distribution to estimate the number of pieces of cloth with 8 or more flaws. 
(3 marks) 


Challenge 


If X ~ Po(A), then the distribution of X can be written as: 


xX 0 il Z 3 oe r 


ey eye e473 ety 


CS Ae 1! 2! 3! rl 


Using this distribution show that E(Y) = 4 and Var(X) = 4. 
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[2.5 ) Mean and variance of the binomial distribution 


You need to know how to calculate the mean 
and variance of a binomial random variable. 
# If X is a binomial random variable with 
X ~ B(n, p), then: 
e Mean of X =E(X) =" =np 
e Variance of X = Var(X) = 07 = np(1 — p) 


Example 8) 


The probability mass function for a 


binomial random variable XY ~ B(n, p) is: 
Pier) (ee py leap) a eer, O23 re 
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A fair, five-sided spinner is spun 20 times. The random variable X represents the number of 5s 


obtained. 
a Find the mean and variance of X. 


X = number of 5s obtained in 20 spins of the 


spinner 
X ~ B(20, 0.2) - 
a E(X)=u=np=20x02=4 

Var(X) = 0? = np(1 — p) 

=20%07 %O.0= 3.2 

b o =V3.2 = 1.766.... 

PXY < w- oa) =P(X < ( - 1.789)) 
= P(X < (4 - 1.7892)) 
= P(X < 2.211) 

P( 


= 0.2061 ° 


Example 9) 


b Find P(X < p- 0). 


Define the random variable carefully. 


The value of p is = = 0.2, as it is a five-sided 
spinner and assumed to be fair. 


As X can only take integer values, 
POC< 22111) = POs 2), 


P(X = 2) can found using your calculator or 
binomial tables. Remember that p = 0.2, n = 20. 


A company produces a certain type of delicate component. The probability of any one component 
being defective is p. The probability of obtaining at least one defective component in a sample of 4 


is 0.3439, 
The company produces 600 components in a day. 


Find the mean and variance of the number of defective components produced per day. 


X = the number of defective components in a 
sample of 4- 


X ~ BY, p) 
P(X = 1) = 0.3439 ; | 
1 — P(X = O) = 0.3439 
P(X = O) = 0.6561 
(1 - p)*=0.656!1 - 7 
l=pH Oo 
p=O0Ol 
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Define your random variable. 


To answer this question we need first to be able 
to find a value for p. 


The probability of obtaining at least one defective 
item in a sample of 4 is 0.3439. 


P(X=0)=($)p%t — p= 1x1 x (1 =p)4= (1 =p) 


Poisson distributions 


Y = the number of defectives in GOO 
components «+ 
Y ~ B(GOO, 0.1) 

Mean = E(Y) = np = GOO x O.1 = GO 
Variance = Var(Y) = np(1 — p) 


Define a new random variable using the value of 
p you obtained earlier. 


= 600 x 01 x 0.9 = 54 


Exercise 2F) 


1 


X is the random variable such that XY ~ B(12, 0.7). Find: 
a E(X) b Var(X) 


X is the random variable such that XY ~ B(n, 0.4) and E(Y) = 3.2. Find: 
a the value of n b P(¥=5) c P(XY $2) 


X is the random variable such that Y ~ B(10, p) and Var(X) = 2.4. 
Find the two possible values of p. 


X is the random variable such that Y ~ B(15, p) and Var(X) = 2.4. 
Find the two possible values of p. 


X is the random variable such that Y ~ B(n, p), EX) = 4.8, Var(X) = 2.88. 
Find the values of n and p. 


The probability of obtaining a head when a biased coin is spun is p, where p < 5 

An experiment consists of spinning the coin 20 times and recording the number of heads. In a 

large number of experiments the variance of the number of heads is found to be 4.2. 

a Estimate the value of p. (2 marks) 

b Hence estimate the probability that exactly 7 heads are recorded during a particular 
experiment. (2 marks) 


The probability that a canvasser gets a reply when she knocks on the door of a house is 0.65. 
a Find the probability that in a street of 10 houses she receives: 
i exactly 5 replies 
ii at least 5 replies. 
b i How many houses should she canvas such that the random variable 
X = ‘number of replies’ has a mean of 78. 
ii What is the variance in this case? 


A sweet company produces chocolate-covered wafer biscuit bars. The probability of a bar being 
solid chocolate is 0.04. 


a Find the probability that in a box of 48 bars, at least two are solid chocolate. (3 marks) 

The company produces 120 boxes of biscuits per day. 

b Find the mean and variance of the number of boxes of biscuits which contain at least two 
solid chocolate bars. (2 marks) 
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© 9 The random variable XY is such that Y ~ B(5, p). Given that PLY = 1) = 0.83193, find: 
a the value of p (3 marks) 
b E(X) and Var(X). (2 marks) 


@) 10 A biased dice is thrown 5 times and the number of sixes is noted. The experiment is conducted 
500 times. The results are shown in the table. 


Number of sixes 0 1 2 3 4 5 
Frequency 163 208 98 28 3 0 


A student wishes to show that the data can be modelled by a binomial distribution. 
a Calculate the mean and variance of the number of sixes in 5 throws of the dice. 
b Based on the mean of the data, estimate the probability p of getting a six with this dice. 


c Using the value found in part b, calculate the expected frequencies of 0, 1, 2, 3, 4 and 5 sixes 
in 500 experiments, using a binomial distribution with parameters n = 5 and p. Comment on 
the student’s suggestion. 


d How does the variance of the data support the use of a binomial distribution? 


Challenge 


If XY ~ B(3, p), prove that: 
a E(X) =3p 
b Var(X) = 3p(1 - p) 


2.6 | Using the Poisson distribution to approximate the binomial distribution 


Evaluating binomial probabilities when x is large can be quite difficult and in such situations it is 


sometimes useful to use an approximation. 
If p is close to 0.5 you can use a normal 


= If X ~ B(n, p) and distribution to model a binomial distribution with 
e nis large large values of n. 


ep is small € Statistics and Mechanics Year 2, Chapter 3 


then X can be approximated by Po(/), where 4 = np. 


There is no clear rule as to what constitutes ‘large 7’ or ‘small p’ but usually the value for mp will be < 10. 
Generally, the larger the value of 7 and the smaller the value of p, the better the approximation will 

be. In this situation, (1 — p) will be close to 1, so Var(Y) = np(1 — p) will be close to the mean of the 
distribution, E(X) = np. This satisfies the condition for a Poisson distribution model that the mean and 
variance are close. 


In general, a question will state whether you need to use a Poisson approximation. 


34 


Poisson distributions 


Example 


The random variable XY ~ B(200, 0.03). 
a Find P(X = 4). 
A Poisson variable Y ~ Po(A) is used to approximate X. 


b Write down the value of 4 and justify the use of a Poisson approximation in this context. 
c Find P(Y = 4) and comment on the accuracy of the approximation. 


COs" OG f= 


a Pr=4)< ra 


= 0.1338 (4 dp.) 


b Under a Poisson approximation, 
Y ~ Po(200 x 0.03), ie4= 6 
As nis large and p is small, then X can be 
approximated by Po(np). 


-6 4 
6 hy=4- ene = 0.1339 (4 dp) 
The answer obtained from the Poisson -——— Compare your answers for parts a and ¢ 


approximation is close to the value obtained 
from the underlying binomial distribution, so 


the approximation is accurate. 


Example 


The probability of a component produced by a certain machine being faulty is 0.007. The number 
of faulty components in a batch of 1000 components is noted. 


a Find the probability that exactly 6 components are faulty. 
b Use a Poisson approximation to find the probability that more than 7 components are faulty. 


ce Explain why the approximation in part b is valid. 


a X =the number of faulty components in a 
batch of 1000 - Define the random variable. 
X ~ B(IO00, 0.007) 

ee 
G 


P(X = 6) = x O.007" * 0.903824 


= 0.1494 (4 dp.) 


b Under a Poisson approximation, 
X ~ Po(1000 x 0.007) i.e. X ~ Po(7) 
P(Y > 7)=1-P(X $7) =1-05987 


~ 0.4013 This value can be calculated from tables or using 


your calculator. 


c The approximation in part b is valid as n is 
large and p is small. 
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1 
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The random variable XY ~ B(100, 0.05). 
a Calculate: 
i P(x=4) ii P(Y <2) 
b Use a Poisson approximation to find estimates for the probabilities calculated in part a. 


The random variable XY ~ B(150, 0.04). 


a Calculate: 
i PixY=5) ii P(Y <3) 


b Use a Poisson approximation to find estimates for the probabilities calculated in part a. 


The random variable Y ~ B(200, 0.98). Problem-solving 


a Calculate: Create a variable X ~ B(200, 0.02) which 

i P(Y=197) ii P(Y> 198) satisfies the conditions for a Poisson approximation. 
b Use a Poisson approximation to find Hence P(Y = 197) becomes P(X = 3). 

estimates for the probabilities calculated 

in part a. 


There are 800 pupils in a school. 
Find the probability that exactly 4 of them 
have a birthday on | April: 


Hint ) If X = ‘number of pupils out of 800 having 


birthday on 1 April’ then X ~ B(800, 522). 


a_ by using a binomial distribution 
b by using a Poisson approximation. 


c Comment on your answers to parts a and b. 


In a manufacturing process the proportion of defective items is 3%. For a batch of 100 articles, 
use a Poisson approximation to find the probability that: 


a there are fewer than 4 defective items 
b there are exactly 2 defective items. 


A medical practice screens a random sample of 180 of its patients for a certain condition which 
is present in 2% of the population. Using a Poisson approximation, find the probability that 
they find: 


a one patient with the condition 
b at least two patients with the condition. 


A researcher has suggested that 1 in 120 people is likely to catch a particular virus. 

Assuming that a person catching the virus is independent of any other person catching it, 

a find the probability that in a random sample of 20 people, exactly one of them catches the 
virus. 

b Using a Poisson approximation, estimate the probability that in a random sample of 900 
people fewer than 7 catch the virus. 
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From company records, a manager knows that the probability that a defective article is 
produced by a particular production line is 0.025. 
A random sample of 10 articles is selected from the production line. 


a Find the probability that exactly 1 of them is defective. 
On another occasion, a random sample of 120 articles is taken. 
b Using a Poisson approximation, find the probability that fewer than 4 of them are defective. 


A manufacturer produces large quantities of pots. 5% of the pots produced are chipped. 
A random sample of 10 pots was taken from the production line. 


a Define a suitable distribution to model the number of chipped pots in this sample. 
b Find the probability that there were exactly 3 chipped pots in the sample. 
A new random sample of 140 pots was taken. 


c Find the probability that there were between 6 and 9 (inclusive) chipped pots in this sample, 
using a Poisson approximation. 


The probability that a tomato plant grows over 2 metres high is 0.08. A random sample of 
50 tomato plants is taken and each tomato plant is measured and its height recorded. 

Find, using a Poisson approximation, the probability that the number of tomato plants over 
2 metres high is between 5 and 8 (inclusive). 


Each cell of a certain insect contains 1200 genes. It is known that each gene has a probability 
0.005 of being damaged. A cell is chosen at random. 


a Suggest a suitable model for the distribution of the number of damaged genes in the cell. 
(1 mark) 


b Find the mean and variance of the number of damaged genes in the cell. (2 marks) 


c Using a Poisson approximation, find the probability that there are at most 4 damaged 
genes in the cell. (3 marks) 


A machine which manufactures nails is known to produce 2.5% defective nails. The nails are 
sold in packets of 200. 


a Using a Poisson approximation, calculate the probability that a packet contains more than 
6 defective nails. (3 marks) 


A carpenter buys 6 packets of nails. 


b Estimate the probability that more than half of these packets contain more than 
6 defective nails. (4 marks) 


The probability of an electrical component being defective is 0.0125. 
The component is supplied in boxes of 400. 


a Using a Poisson approximation, estimate the probability that there are more than 3 defective 
components in a box. (3 marks) 


A retailer buys 5 boxes of components. 


b Estimate the probability that there are more than 3 defective components in 3 of 
the boxes. (3 marks) 
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It is claimed that 95% of the letters posted Ist class arrive the next day. Based on this claim, 
calculate, using a Poisson approximation, the probability that in a sample of 180 letters, 


a more than 173 arrive the day after posting (3 marks) 
b fewer than 168 arrive the day after posting. (3 marks) 


A farmer supplies a bakery with eggs. The manager of the bakery claims that the proportion of 

eggs which are broken on delivery is 1%. The farmer supplies the eggs to the bakery on a daily 

basis in consignments of 150 eggs. 

a Based on the claim of the manager, calculate, using a Poisson approximation, the probability 
that a consignment contains more than 4 broken eggs. (3 marks) 

The farmer supplies a consignment to the baker on 5 days every week. 

b Calculate the probability that in a particular week one of the consignments contains more 
than 4 broken eggs. (4 marks) 


Mixed exercise e 


1 


EP) 3 
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On a stretch of road, accidents occur at a rate of 0.7 per month. 


Find the probability of: 

a no accidents in the next month (2 marks) 
b exactly 2 accidents in the next 3-month period (2 marks) 
c no accidents in exactly 2 of the next 6 months. (3 marks) 


The random variable X is the number of misprints per chapter in the first edition of a new 
textbook. 


a State two conditions under which a Poisson distribution is a suitable model for Y. (2 marks) 


The number of misprints per chapter has a Poisson distribution with mean 2.25. Find the 
probability that: 


b arandomly chosen chapter has no more than one misprint (3 marks) 
c the total number of misprints in 2 randomly chosen chapters is more than 6. (3 marks) 


The random variable Y ~ Po(A). 
Find the value of 4 such that P( Y = 5) is 1.25 times the value of P(Y = 3). (3 marks) 


A company receives emails at a mean rate of 3 every 5 minutes. 


a Give two reasons why a Poisson distribution could be a suitable model for the number of 


emails received. (2 marks) 
b Calculate the probability that, in a 10-minute period, the company receives: 

i exactly 7 emails (2 marks) 

ii at least 8 emails. (2 marks) 
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a State the conditions under which the Poisson distribution may be used as an approximation 

to the binomial distribution. (2 marks) 
Left-handed people make up 8% of a population. A random sample of 50 people is taken from 
this population. The discrete random variable X represents the number of left-handed people in 
the sample. 


b Calculate PLY < 3). (3 marks) 
c Using a Poisson approximation, estimate PY < 3). (3 marks) 
d Calculate the percentage error in using the Poisson approximation. (2 marks) 


The number of telephone calls per hour 
received by a small business is a random 
variable with distribution Po(A) where / is 
an integer. Natalia records the number of calls, Y, received in an hour. 

Given that P(Y > 10) < 0.1, find the largest possible value of A. (3 marks) 


Hint ) Use the Poisson distribution tables. 


The probability of a plant cutting successfully taking root is 0.075. Find the probability that, 
in a batch of 20 randomly selected plant cuttings, the number taking root will be: 


a i exactly 2 (2 marks) 
ii more than 4. (2 marks) 


A second random sample of 80 plant cuttings is selected. 


b Using a Poisson approximation, estimate the probability of at least 8 plant cuttings taking 
root. (3 marks) 


An angler is known to catch fish at a mean Hint ) Pode = See eocon onde 
rate of 2 per hour. The number of fish Then use this value as the parameter p ina 


caught by the angler in an hour follows a Sinema are 

Poisson distribution. 

The angler takes 5 fishing trips, each lasting 2 hours. 

Find the probability that the angler catches at least 5 fish on exactly 3 of these trips. (5 marks) 


The number of cherries in a Megan’s fruit cake follows a Poisson distribution with mean 2.5. 
A Megan's fruit cake is to be selected at random. Find the probability that it contains: 


a i exactly 4 cherries (2 marks) 
ii at least 3 cherries. (2 marks) 


Megan's fruit cakes are sold in packets of 4. 


b Calculate the probability that there are more than 12 cherries, in total, in a randomly 
selected packet of Megan's fruit cakes. (3 marks) 


Eight packets of Megan’s fruit cakes are selected at random. 
c Find the probability that exactly 2 packets contain more than 12 cherries. (3 marks) 


A car salesman sells cars at a mean rate of 6 per week. 


a Suggest a suitable model to represent the number of cars sold in a randomly chosen week. 
Give two reasons to support your model. (2 marks) 


b Find the probability that in any randomly chosen week the salesman sells exactly 
5 cars. (2 marks) 
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c Find the probability that in a period of 4 consecutive weeks there are exactly 2 weeks in 
which the salesman sells exactly 5 cars. (3 marks) 


Abbie and Ben share a flat. Abbie receives letters at a mean rate of 1.2 letters per day while Ben 
receives letters at a rate 0.8 letters per day. Assuming their letters are independent, calculate the 
probability that on a particular day: 


a each receives at least | letter (3 marks) 
b they receive a total of 3 letters between them. (2 marks) 
Given that post is delivered to the flat from Monday to Friday, 


c find the probability that in one particular week they receive a total of 3 letters on at least 3 of 
the days. (4 marks) 


An electrical outlet sells desktop and laptop computers. The desktops are sold at a mean rate of 
2.4 per day and the laptops are sold at a mean rate of 1.6 per day. Calculate the probability that 
on a particular day the outlet sells: 


a at least 2 desktops and at least 2 laptops (3 marks) 
b acombined total of 6 computers. (2 marks) 


c Calculate the probability that over a two-day period they sell a combined total of no more 
than 6 computers. (3 marks) 


An airline knows that overall 4% of passengers do not turn up for flights. The airline has a 
policy of selling more tickets than there are seats on a flight. For an aircraft with 148 seats, the 
airline sold 150 tickets for a particular flight. 


a Write down a suitable model for the number of passengers who do not turn up for this 


flight after buying a ticket. (2 marks) 
By using a Poisson approximation, find the probability that: 
b more than 148 passengers turn up for this flight (2 marks) 
c there is at least one empty seat on this flight. (3 marks) 


A receptionist routes incoming telephone calls to rooms within a hotel. The probability of the 
caller being connected to the wrong room is 0.02. 


a Find the probability that more than | call in 10 consecutive calls is connected to the 
wrong room. (3 marks) 


The receptionist receives 500 calls each day for guests in the hotel. 
b Find the mean and variance of the number of wrongly connected calls. (2 marks) 


c Use a Poisson approximation to find the probability that fewer than 8 calls each day are 
connected to the wrong room. (3 marks) 


A disease occurs in 2.5% of a population. 
a Find the probability of exactly 2 people having the disease in a random sample of 

10 people. (2 marks) 
b Find the mean and variance of the number of people with the disease in a random sample 

of 120 people. (2 marks) 
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A doctor tests a random sample of 120 patients for the disease. He decides to offer all patients a 
vaccination to protect them from the disease if more than 6 of the sample have the disease. 


c Using a Poisson approximation, find the probability that the doctor will offer all patients a 
vaccination. (3 marks) 


Accidents occur randomly at a roundabout at a rate of 15 every year. 


a Find the probability that there will fewer than 5 accidents at the roundabout in a 6-month 
period. (2 marks) 


b Find the probability that there will be at least 1 accident in a single month. (2 marks) 


c Find the probability that there is at least 1 accident in exactly 4 months of a 6-month period. 
(3 marks) 


An office photocopier breaks down randomly at a rate of 8 times per year. 
a Find the probability that there will be exactly 2 breakdowns in the next month. (2 marks) 
b Find the probability of at least 2 breakdowns in 3 of the next 4 months. (3 marks) 


A holiday website receives visits at a rate of 240 per hour. 
a State a distribution that is suitable to model the number of visits obtained during a 


1-minute interval, and justify your choice of distribution. (3 marks) 
Find the probability of: 
b 8 visits in a given minute (2 marks) 
c atleast 10 visits in 2 minutes. (2 marks) 


The number of policies sold by a life insurance company employee each week over a 150-week 
period is recorded. 


Number of policies sold 0 1 2 3 4 5 6 7 8 
Number of weeks 10 23 35 33 24 14 7 3 
a Calculate the mean and the variance of the data. (3 marks) 


b Explain why the results in part a suggest that a Poisson distribution may be a suitable 
model for the number of policies sold in a week. (1 mark) 


c Use a Poisson distribution to estimate the probability that no more than 2 policies will be 
sold in a given week. (3 marks) 


Challenge 


During normal operational hours, planes land at an airport at an 
average rate of one every four minutes. 


Given that exactly 10 planes landed at the airport between 2 pm and 
3 pm, find the probability that 


a exactly 5 planes landed between 2 pm and 2.30 pm. 


b more than 7 planes landed between 2 pm and 2.30 pm. 
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Summary of key points 


1 If X ~ Po(A), then the Poisson distribution is given by: 
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In order for the Poisson distribution to be a good model, the events must occur: 

e independently 

e singly, in space or time 

e ataconstant average rate in that the mean number in an interval is proportional to the 
length of an interval 


If two Poisson variables X and Y are independent, the variable Z = X + Y also has a Poisson 
distribution. 
If X ~ Po(A) and Y ~ Po(u), then ¥ + Y ~ Po(A + pz) 


If X has a Poisson distribution with XY ~ Po(A), then: 
e Mean of X=E(X)=A 
e Variance of X¥ = Var(X) =o02=A 


If X has a binomial distribution with X ~ B(n, p), then: 
e Mean of X¥=E(X) = =np 
e Variance of X = Var(X) = 0% = np(1 - p) 


If X has a binomial distribution with X ~ B(n, p), and 
e nis large 
e pissmall 
then X can be approximated by Po(A), where A = np. 


After completing this chapter, you should be able to: 
e Understand and use the geometric distribution -—> pages 44-47 


@ Calculate and use the mean and variance of the geometric 
distribution pages 47-49 


e@ Understand and use the negative binomial distribution 
pages 49-52 


@ Calculate and use the mean and variance of the negative 
binomial distribution pages 52-54 


X ~ B(20, 0.4) 
a Find: 

i P(X=10) ii P(X¥ <8) iii P(X = 12) 
b Calculate: 


al J i E(X) ii Var(X) 
«ly 5 ae pdt oy bale } re Ca ¥ 2s ei! € Statistics and Mechanics Year 1, Chapter 6 
The geometric distribution can be used ve 2 A fair dice is rolled repeatedly. 
to model the number of times a learner 2% Find the probability that a six first appears 
driver needs to take their test before ae on the 3rd roll. 


passing. — Exercise 3B, Q3 rei € Statistics and Mechanics Year 1, Chapter 5 
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ED The geometric distribution 


If you are carrying out successive, independent trials, each with the same probability of success, you 
can model the number of trials needed to achieve a single success using the geometric distribution. 
For example, if you are rolling a fair dice repeatedly, the number of rolls needed before a six is rolled 
can be modelled using the geometric distribution. This might be particularly useful if you are playing 
a board game where you have to roll a specified number to start the game. 


Johan is playing a board game where he has to roll a six to start. Find the probability that Johan 


starts the board game on his fourth roll. 
t Online ) Explore the geometric distribution ey 


P(six) = 


i 
6 using GeoGebra. 
Waa 
P(not six) = C 
P(first six occurs on fourth roll) 
2 dg 
eG. iG 6 
25 


You can define a discrete random variable, Y, as the number of rolls needed to obtain a six. 
X has the following probability distribution: 


es ae 
6 6 6 6 |" 


Note that XY can take any positive integer value. In practice, as x gets large, P(Y = x) gets smaller and 
smaller, and tends towards 0. 


X 
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P(X = x) 


= For successive independent trials, each with constant probability of success, p, the number 
of trials needed to get one success, X, has the geometric distribution, with probability 


function: 
Notation ] You write X ~ Geo(p). 


P(X = x) = p(x) = p(1- p)*-? £=1,2,3) «+ 


You can see that the values of p(x) form a geometric sequence with first term, p, and common ratio, 
(1 — p). You can derive the cumulative geometric distribution by considering the sum of the terms 


of the geometric series. : 
‘ 7 \ The sum of the first m terms of a 
S ptr) eu “te eu aUs2)) geometric series with first term a and common 
r=1 1-(1-p) P or: Ss) 
=1-(1 —p)* ratio7 is S, = = < Pure Year 2, Chapter 3 


= If X ~ Geo(p), then the cumulative geometric distribution is given by: 


P(X S x) =1-(1-p)* x= 1,2, 3; 
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Since P(Y < x) + P(X > x) = 1, you can also deduce that P(X > x) = (1 — p)*. This corresponds to the 
situation where the first x trials result in failure. This result can also be written P(X = x) = (1 —p)*"! 


Example @ 


The probability that Genevieve passes her driving test on any one attempt is 0.6. 
a Find the probability that: 

i she passes on her fifth attempt 

ii she needs five or fewer attempts to pass 

iii she needs more than five attempts to pass. 


b State two assumptions you have used in your calculations. 


a i Let X = number of attempts needed. 
X ~ Geo(0.6) + 
P(X = 5) = O.6 x (1 - 0.6)*> 
= 0.01536 
i P(X=5)=1-(1-06)9 
= 0.98976 
ii PX > 5) =1-P(X = 5)- 
=1- 0.96976 
= 0.01024 
b Each attempt is independent and has the 
same probability. Online ) Explore the culmulative geometric 


distribution using GeoGebra. 


Exercise 3A) 


1 The random variable XY ~ Geo(0.15). Find: 
a P(X¥= 10) b P(X <7) ec P3B<X<12) 


2 The random variable Y ~ Geo(0.23). Find: 
a P(Y=6) b P(Y=4) c PQ2< Y<8) 


3 Alice rolls a fair 6-sided dice. She records X, the number of rolls it takes to get a 1. Given that 
each roll is independent, find: 


a P(X¥=4) b P(XY <3) c P(X¥=5) d P(2< X¥ <6) 


4 Bernhard has to pass an examination to get into law school. He can take the examination as 
many times as he likes and his probability of passing on any one attempt is 0.3. 


a Find the probability that: 


i he passes on his third attempt (2 marks) 
ii he takes at least four attempts to pass. (3 marks) 
b State two assumptions that you have used in your model. (2 marks) 
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Carolina is playing a board game with a fair four-sided dice. She must roll a 4 to start. 


a Given that each roll is independent, find the probability that: 
i she starts on her first go 
ii she starts on her fifth go 
iii she takes no more than four attempts to start. 


b State one assumption you have used in your model. 


Donald is taking part in a competition where he has to complete a task within a given time 
limit. He can have as many attempts as he likes. The probability that he completes the task in 
the given time on each attempt is 0.45. He uses X to represent the number of attempts he needs 
and finds that PLY = x) = 0.136 125. Given that each attempt is independent, calculate: 


a the value of x 
b the probability that Donald completes the task within 4 attempts. 


X ~ Geo(0.032) 

a Given that PLY = x) = 0.0203 (4 d.p.), find the value of x. 

Find: 

b the largest value of x such that PLY = x) < 0.1 

c the smallest value of x such that PLY = x) < 0.05. Hint ) P(X > x) = (1 — p)*~! 


Edith works for a computer company on a telephone help desk. Callers either report a problem 
with their hardware or their software. The probability that a randomly chosen caller reports 

a problem with their hardware is 0.1. Given that each call is independent, find the probability 
that: 


a the first caller with a hardware problem is Edith’s 7th caller 


b the first call from a caller with a hardware problem comes in after the Sth call. 


Isabelle is surveying people about their eating habits. She asks people who pass her by in the 
street, at random, whether they like squid pizza. Given that the probability that a randomly 
chosen person likes squid pizza is 0.05 and that each person is independent, find the probability 
that: 


a the first person to like squid pizza is the 10th person she asks 


b she asks at least 15 people before finding someone who likes squid pizza. 


Frances, Georgina and Holly agree who is to do the washing up by playing a game. They each 
roll a fair six-sided dice and record whether the outcome is odd or even. If two of them get the 
same outcome and the third one gets a different outcome, that person has to wash up. If no 
decision is reached they play again. 

Given that each game is independent, find the probability that: 


a a decision is reached on the third game (4 marks) 


b it takes at least four games to reach a decision. (2 marks) 


Geometric and negative binomial distributions 


M11 Julian works in a village pharmacy and he finds that the long-term average number of 
(E/P) customers per hour is 4. Find the probability that: 


a at least 5 customers come in during a randomly chosen hour. (2 marks) 
He records the number of customers during each hour on a particular day. 
Find the probability that: 
b the first occurrence of 5 or more customers is in the 5th hour (2 marks) 
c he goes through a whole 8-hour shift with no single hour having 5 or more 

customers. (3 marks) 


EE) Mean and variance of a geometric distribution 


You need to be able to make use of the following results about the mean (or expected value) and 
variance of a geometrically distributed random variable: 


= If X ~ Geo(p), then: 


° Mean of X= E(X) =" = 


¢ Variance of X = Var(X) = 07 = 


Example a 


Dorothy flips a biased coin until it lands on heads. She records the total number of flips, Y. 
Given that the mean of Y is 2.5, find: 


a_ the probability of the coin landing on heads on a single flip 
b the standard deviation of Y. 


1-p 
2 


a Y ~ Geo(p) 
1 
- z Ce 
1 
p= a5 = O04 
The probability of the coin landing on 
heads is OA. 
PS 
— = ee 
~I=OA4 
0.42 
= 3.75 t Watch out | The question asks for the standard 
es (S75 =1943 eh deviation, so find Var(Y) and then find the 


square root. 


Exercise 3B) 


1 The random variable ¥ ~ Geo(0.2). Find: 
a E(XY) b Var(X) 
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2 Zachariah rolls a fair six-sided dice and records X, the number of rolls it takes for him to get a 
multiple of 3. Given that each roll is independent, find: 


a E(X) b Var(X) 


3 The probability that Yolanda passes her driving test at any one attempt is 0.65. Given that each 
attempt is independent, find the probability that: 
a she passes on the third attempt b it takes at least four attempts to pass. 
ce Find: 
i the expected number of times Yolanda will have to take her test 
ii the variance. 


4 The geometrically distributed random variable ¥ ~ Geo(p), has E(X) = 4. Find: 
a p b Var(X) 


(p) 5 Xavier is practising shooting a basketball from the ‘free throw’ line and records the number of 
throws_X, that he takes to get a basket each time. Given that each throw is independent and 
that Var(X) = 20, find: 


a the probability that he hits a basket each throw b E(X) 


(P) 6 Wilma is a charity collector and goes door-to-door trying to raise money. Given that the 
probability of her getting a donation at each house is p, that each house call is independent and 
the variance is 380, find: 
ap 
b the expected number of house calls Wilma must make before getting a donation. 


(E) 7 Vincent records X, the number of attempts it takes him to parallel-park his car in a particular 
space. 
a State a suitable probability distribution for X. (1 mark) 
b State two assumptions that must be made for this distribution to be appropriate. (2 marks) 
Given that P(X = 2) = 0.16 and that p < 0.5, find: 


c p, the probability that Vincent parks correctly on each single attempt (3 marks) 
d the expected number of attempts Vincent takes (1 mark) 
e the variance of X. (1 mark) 


(E/P) 8 Uma has a bag of marbles, 15% of which are blue. She puts her hand in the bag and pulls out a 
marble at random. If the marble is not blue, she puts it back in the bag and tries again. 


a Calculate: 


i the mean 
ii the variance of the number of marbles she pulls out, up to and including the first 
blue one. (2 marks) 
Calculate the probability that: 
b she pulls out 4 marbles (2 marks) 
c she pulls out at least 8 marbles (3 marks) 
d_ she pulls out fewer marbles than expected. (2 marks) 
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mM 9 Tabitha the cat is trying to catch fish out of a garden pond. The probability that she catches a 
(E/P) fish at each attempt is 0.12. 


a State two assumptions that must be made to model this situation as a geometric 


distribution. (2 marks) 
b Find: 

i the probability that Tabitha takes two attempts to catch a fish (2 marks) 

ii the probability that Tabitha takes at least three attempts to catch a fish. (3 marks) 


c Find the expected number of attempts Tabitha takes to catch a fish and the variance. (2 marks) 
Once Tabitha has caught one fish, she tries to catch a second one. 
d Find the probability that she needs three attempts to catch her first fish, and three further 


attempts to catch her second fish. (3 marks) 
e Find the probability that she caught the first fish on the second attempt, and the second fish 
on the fifth attempt. (3 marks) 


(E/P) 10 Simeon, a tailor, records the number of faults, X, per metre of cloth. 
He finds that there are, on average, 0.8 faults per metre. 


a State, giving any assumptions you make, a suitable probability distribution for XY. (2 marks) 


b Find the probability that in a randomly chosen metre of cloth there are more than two 
faults. (1 mark) 


Simeon cuts the cloth into one-metre lengths. 
c Find the probability that the first time he encounters a metre of cloth with more than two 


faults is in the 7th metre. (3 marks) 
d Find the expectation and variance of the number of metres of cloth he cuts before he finds a 
metre length with more than two faults. (2 marks) 


If a metre of cloth containing two or more faults occurs before Simeon has cut 3 one-metre 
lengths, he sends that roll of cloth back to the manufacturers. 


e Find the probability that he sends back two consecutive rolls of cloth. (2 marks) 


Ee The negative binomial distribution 


The binomial distribution, ¥ ~ B(m, p), models the number of successes in a fixed number of trials, x. 
The probability of success in each trial, p, is constant and the trials are independent. 


Suppose, instead, you want to consider the number of 
trials needed to achieve a fixed number of successes, r. Hint | Te aaa a pe esse 
For example, suppose you roll a fair dice repeatedly until ’ 
you have rolled a total of three sixes. You can define a 

discrete random variable, X, as the number of rolls needed. 


then X = 3. X¥ can take any integer value 
greater than or equal to 3. 


To find the probability distribution of X, consider what is required for X to take any particular value. 
For example, P(X = 10) is the probability that the third six occurs on the 10th roll. 
This means that exactly 2 sixes have been rolled 


in the previous 9 rolls. The number of sixes rolled in 9 rolls has 
a OV 112/57 the binomial distribution BQ, 2). 
P(exactly 2 sixes in first 9 rolls) = (2)(3) (2) € Statistics and Mechanics Year 1, Chapter 6 
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You then need to roll a six on the 10th roll, with probability ‘. Since each roll is assumed to be 
independent: 


P(X = 10) = P(exactly 2 sixes in first 9 rolls) x P(six on the 10th roll) 
9\ /1\2/5\7 
= (3)(6)'(8) » 3 
9\/4\3/6\7 
= (3)(8) 8) 


You can calculate the probabilities for other values of X in a similar way, giving you the following 
probability function for X: 


ea 1) ia? ee? 
pve ad= ("5 ‘)l8 
This is an example of a negative binomial distribution. 


= For successive trials, each with constant probability of success, p, the number of trials needed 
to get r successes, X, has the negative binomial distribution, with probability function: 


P(X =x) = p(x) = ("7 t)pr(a-pye-* x=r,r+i1,r+2,... 


This is the probability of r - 1 successes in Notation | There is no standard notation for the 
x — 1 trials multiplied by the probability of negative binomial distribution, but you can write 
success in the xth trial. X ~ NB(r, p) or XY ~ Negative B(y, p). 


Philomena is practising her piano scales. The probability that she completes a scale correctly on any 
one attempt is 0.4. She continues practising until she has completed four scales correctly. 


a Find the probability that she completes her fourth correct scale on her 12th attempt. 
b Find the probability that she completes her fourth correct scale on her 10th attempt, given that 


her first scale was correct. 
t Online ) Explore the negative binominal cy 


a X = number of attempts needed to distribution using GeoGebra. 


complete four scales without a mistake 
X ~ Negative B(A, O.4) + 


P(X = 12) = (4) x 0.44 x 0.68 


= 0.0709 (4 d.p.) 
b Y=number of attempts needed after the 


first attempt 


Y ~ Negative B(3, O.4) Problem-solving 


P(Y=9)= (5) x 0.43 x O.6* If the first scale was correct, Philomena now 
7 must complete three scales correctly in exactly 
= ORSe Ter) nine further attempts. This means she needs to 
complete 2 correctly in the next 8 attempts, with 


probability e) x 0.4? x 0.65, and then complete 
the final scale correctly, with probability 0.4. 
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Exercise 3) 


m1 


() 7 


Aulden throws a fair four-sided dice. Find the probability that he throws a 4 for the third time 
on his sixth throw. 


Billie spins a coin, biased towards heads. If the probability of spinning a head is 0.55, find the 
probability that Billie spins her fourth head on her seventh spin. 


Chuck is shooting at a target with a bow and arrow. The probability that he hits the bullseye on 
any particular shot is 0.15. Find the probability that Chuck hits the bullseye for the second time 
on his tenth shot. 


Denise takes part in a multiple-choice quiz where she picks the answers at random. 
Given that her probability of picking any correct answer is 0.25, 


find the probability that: , Watch out ) Not all 


a she picks her first correct answer on her third question of the parts of this 


b she picks her fourth correct answer on her seventh question question require you 
to use the negative 


c she gets exactly two correct answers in the first ten questions Ag Re 
binomial distribution. 


d her third correct answer occurs on or before the tenth question. 


Eliot plays tennis and his probability of winning any particular match is constant such that 
P(win) = 0.3. Find the probability that: 


a_ he wins his first match on the fourth attempt Hint ) In part b you need to 

b he needs to play more than ten matches to win four times calculate the probability that he 

c he wins his third match on the eighth attempt Re al Liles ee Lal 
10 matches. 


d_ his fifth win occurs on or before his twelfth game. 


Francesca is playing a series of games with her sister. The probability that Francesca wins any 
particular game is 0.55. 


a Find the probability that Francesca wins her fourth game on the sixth attempt. (2 marks) 
b State two assumptions that have to be made for the model used in part a to be 

valid. (2 marks) 
c Find the probability that Francesca wins her third game on the fifth attempt, given that 

she won the first game. (2 marks) 


d Find the probability that Francesca wins at least seven out of the first ten games. (3 marks) 


Gerald is trialling a new drug that in previous trials has cured patients 80% of the time. 
a Find the probability that Gerald’s drug cures the seventh patient on the tenth trial. (2 marks) 
b State two assumptions that have to be made for the model used in part a to be 


valid. (2 marks) 
c Find the probability that Gerald has cured at least seven patients in the first twelve 

trials. (3 marks) 
d Find the probability that it takes more than 20 trials to cure 15 patients. (2 marks) 
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mM 8 Harriet is conducting scientific experiments. Each experiment is independent and the 
E/P) probability of any single experiment working is 0.1. 
a Find the probability that Harriet achieves her second successful experiment on her 
20th attempt. (2 marks) 
b Find the probability that it takes her more than 30 experiments to get four successes. (2 marks) 
c Find the probability that Harriet achieves her fourth successful experiment on her 


30th attempt given that she was first successful on the fifth attempt. (2 marks) 
d Find the probability that Harriet is successful in at least three experiments out of the 
first 25. (3 marks) 


(P) 9 The random variable X has negative 


binomial distribution, Negative B(5, 0.7). Problem-solving 
Find: 


a P(Y= 10) b P(X <6) Part b is asking you to find the probability that the 5th 
© P(X <15) d P(Y> 12) success occurs either on the 5th trial or the 6th trial. 


(E/P) 10 A darts player is trying to hit the bullseye. She throws darts at the board until she scores three 
bullseyes. The random variable D is the number of darts she needs to throw. 


a State a suitable distribution to model D. (1 mark) 
b Given that the probability of her hitting the bullseye on each attempt is 0.35, find the 
probability that: 
i it takes her seven throws to score the three bullseyes (2 marks) 
ii it takes her at least eight throws to score the three bullseyes (2 marks) 
iii it takes her nine throws, given that she hits the bullseye on her first throw. (2 marks) 
c Give one reason why this model may not accurately represent the situation. (1 mark) 


Challenge 


The function F,, , is defined as follows: 
F,,p(X) = P(X S x) where XY ~ Bin, p) 
a Given that Y ~ Negative B(3, 0.4), show that P(Y = 8) can be written as 
1-F,, (x), where n, p and x are values to be found. 
b Given that Y ~ Negative B(r, p), find a general expression for P(Y S y) in 
terms of F,, ,(x) for suitable n and x. 


[3.4 | Mean and variance of the negative binomial distribution 


You need to be able to make use of the following results about the mean (or expected value) and 
variance of a random variable with the negative binomial distribution. 


= If X ~ Negative B(r, p), then: 
¢ Mean of X=E(X) =n =5 


* Variance of X = Var(X) = 07 = 


Iona and Juan play noughts and crosses. The probability that Iona wins is 0.7. The random variable 
X represents the number of games that they need to play for Iona to win seven times. 


a Find E(X) and Var(Y). 


r(1 — p) 
2 
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i They change games and start playing chess. The random variable Y represents the number of 
games that they need to play for Juan to win twice. Given that the mean of Y is 6, find: 


b Juan’s probability of winning any single game of chess 
c the standard deviation of Y. 


a X ~ Negative B(7, 0.7) 


EKO) =5=o7- 10 


1- _ 
VartX) = Mae. l= C7) 


pe O7 = 


b Y ~ Negative B(2, p) 
es 1 
‘aap eee 
The probability of Juan winning any one 


ac. 
game of chess is 3: 


2(1 - 4) y Watch out The ‘mean’ and the ‘expected 


ce Var(Y) = gz = 12 value’ of a random variable are the same thing. 
(5 Remember that the standard deviation is the 
Standard deviation = V12 = 346 square root of the variance. 


Exercise 3D) 


1 The random variable X has negative binomial distribution with p = 0.4 and r = 3. Find: 
a E(X) b Var(X) 


2 The random variable Y has negative binomial distribution with p = 0.75 and r = 10. Find: 
a E(Y) b Var(Y) 


3 The random variable M has negative binomial distribution with probability p, and with r = 2. 
Given that E(M) = 8, find: 
a the value of p b P(M=5) ec Var(M) 


(Pp) 4 D~ Negative B(8, p). Problem-solving 


a Given that Var(D) = 30, find the value of p. _ pis a probability so it must be a positive number. 


b Find: 
i PW=12) ii P(D < 10|first trial was successful) 


5 X ~ Negative B(r, 0.4). 
a Given that E(Y) = 15, find the value of r. 


b Find: 
i P(Y¥=10) ii P(Y <8) 
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M™ 6 The random variable X has negative binomial ; 
ane : . Problem-solving 
E/P) distribution with mean 6 and variance 3. 
Find: Write two equations involving r and p and solve 
them simultaneously. 


a the value of p and the value of r 
b P(XY=4). 

(E) 7 Kelly and her classmates are taking part in a competition where students take turns attempting 
to solve a puzzle. The probability that each student solves the puzzle is 0.7. The random 


variable Y represents the number of students who need to attempt to solve the puzzle before 
five have solved it. 


a State two conditions that are necessary for XY to be modelled by a negative binomial 
distribution. (2 marks) 


b Using a negative binomial model, find the mean and standard deviation of XY. (3 marks) 


(E) 8 In each trial of an experiment, four fair coins are spun. 
a Find the probability that all four coins show the same result. (2 marks) 


b Find the probability that all four coins show the same result for the third time on the 
sixth trial. (3 marks) 


c Find the expected number of trials needed in order for the coins to show the same result 
12 times. (2 marks) 


(E) 9 Michelle is playing darts. The probability that she hits ‘treble twenty’ with any one dart is p. 
Given that the expected number of throws needed in order for her to hit the ‘treble twenty’ 
3 times is 18.75, find: 


a the value of p (1 mark) 
b the variance. (2 marks) 
Michelle gets some coaching. Given that her new probability of hitting the “treble twenty’ is 
0.24, find: 

c the expected number of throws needed to hit the ‘treble twenty’ 5 times (1 mark) 


d the probability that it takes her more than the expected number of throws to hit the ‘treble 
twenty’ 5 times. (3 marks) 


(E/P) 10 Norman picks marbles at random from a bag that contains 100 marbles. He notes the colour 
and replaces the marble. He repeats the process until he has selected a green marble r times. 
The random variable X represents the total number of times he selects a marble. 
a State a distribution that could be used to model X. (1 mark) 
b Given that the mean and standard deviation of X are 12 and 6 respectively, calculate: 
i the number of green marbles in the bag 
ii the value of r. (4 marks) 


Alison selects marbles from the same bag. She notes the colour but does not replace the marble 
each time. 


c Give a reason why a negative binomial distribution is not a suitable model for this 
situation. (1 mark) 


d Find the probability that Alison picks her second green marble on her third pick. (3 marks) 
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Mixed exercise ey 


1 


A 
©) 


(E) 2 


An unbiased eight-sided dice is thrown repeatedly. The first multiple of 3 appears on the rth 
throw. Calculate the probability that: 


a r=5 (2 marks) 
b the value of r is at least 3. (3 marks) 


An engineer is checking welds on an oil tanker. The percentage of defective welds is thought to 
be 10%. If X represents the number of welds checked up to and including the first defective one, 


a state the distribution that can be used to model XY (1 mark) 
b find the mean and variance of X (2 marks) 


c find the probability that the engineer has to check at least 12 welds before finding a 
defective one. (2 marks) 


Olivia is playing hoopla and she continues to throw the hoop until she hits the target. 
The random variable X represents the number of throws she needs. 


a State a suitable distribution to model_Y. (1 mark) 
Given that the mean of YX is 6, 

b find the probability that Olivia hits the target first on her fifth attempt (2 marks) 
c find the variance of X. (2 marks) 
d State any assumptions you have made in using this model. (2 marks) 


Soujit is designing a game for a charity day. In his game, contestants have to roll a fair ten-sided 
dice a certain number of times. If the contestant rolls a 10 then they win. 


Soujit wants the probability of winning to be less than 0.5. 
Find the maximum number of times Soujit should allow contestants to roll the dice. (4 marks) 


A supermarket knows from experience that when they purchase avocados from a particular 
supplier, any particular one of them has a 0.02 chance of being over-ripe. 


A box of 24 avocados will be rejected if more than three are over-ripe. 

a Find the probability that a particular box of avocados 1s rejected. (2 marks) 
The supermarket is unloading a shipment of boxes. 

b Find the probability that the 20th box unloaded is the first to be rejected. (3 marks) 


Pablo is playing a fairground game where his probability of winning a prize is 0.2. 
He plays the game several times. 


a Find the probability that he first wins a prize on his sixth game. (1 mark) 
b Find the probability that he wins his second prize on his tenth game. (2 marks) 


c Find the mean and standard deviation of the number of games Pablo needs to play to 
win his fifth prize. (3 marks) 


Quinn plays a different game until he has won r prizes. Given that X represents the number 
of games Quinn plays and that E(Y) = 12 and Var(X) = 16, 


d find the probability of Quinn winning a game. (3 marks) 
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™ 7 The random variable X is the number of times a biased dice is rolled until 4 sixes have occurred. 
E/P) The variance of X is 15. 


a Find the probability of rolling a six. (3 marks) 
b Find P(X = 10). (2 marks) 
c Find P(X > 8). (3 marks) 
d Find P(X = 9Ja six occurs on the first roll). (3 marks) 


(E/P) 8 Roberta is taking part in a penalty shootout contest. The probability that she scores a goal on 
any one attempt is 0.65. 


a Show that the probability that she first scores a goal on her second attempt is 0.2275. (2 marks) 
b Find the probability that: 


i she scores a goal exactly 5 times during her first 8 attempts (1 mark) 
ii she scores her fifth goal on her 8th attempt (2 marks) 
iii she takes more than 9 attempts to score 5 goals (2 marks) 
iv she scores a goal exactly 4 times in 7 attempts, given that she scores on each of her first 
two attempts. (3 marks) 


c Calculate the mean and standard deviation of the number of attempts she needs to score 

5 goals. (3 marks) 
Sukie decides to take part as well. Her probability of scoring a goal on any one attempt is 0.4. 
Roberta and Sukie take it in turns to shoot at the goal, with Sukie going first. The first girl to 
score a goal wins. 
d Find the probability that Roberta wins on her first attempt. (2 marks) 
e Find the probability that Sukie wins on her second attempt. (2 marks) 
f The contest is drawn if neither girl scores a goal with three attempts. 

Find the probability that the contest is drawn. (2 marks) 


Challenge 


1 Ina fairground game, a player throws bean bags at a target. 
A particular player hits the target with probability p. The random 
variable X is the number of attempts needed by the player to hit the 
target twice. 
a Write down the distribution of X. 
The random variables Y, and Y> are defined as: 
Y, = number of attempts needed to hit the target the first time 


Y, = number of attempts after the first hit needed to hit the target the 
second time 


b Write down the distribution of Y; and Y5. 
c Write Xin terms of Y, and Y,. 


2 
d_ Hence show that E(Y) =— 
P Hint ) You may assume that, 
2 Usea similar technique to that outlined in question 1 to prove that, for for the random variable 
the random variable XY ~ Negative B(r, p), Y ~ Geo(p), 
r rp) Ey) a dv. tml 
ar (Y) pa ar(Y) Pe 
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Summary of key points 


1 


For successive independent trials, each with constant probability of success, p, the number of 
trials needed to get one success, X, has the geometric distribution, with probability function: 


Pex) =p =p0l=py ae ee 


If ¥ ~ Geo(p), then the cumulative geometric distribution is given by: 
P(Y <x) =1-(1-p)* a= IL, 2, Sh cos 

You can also deduce that 

P(X > x) =(1-p)* 

P(X > x) =(1—p)*"! 


If ¥ ~ Geo(p), then: 
e Mean of X= E(X) ="=5 
‘il = 
e Variance of X = Var(X) =0%= a 
Pp 


For successive trials, each with constant probability of success, p, the number of trials needed 
to get r successes, X, has the negative binomial distribution, with probability function: 


P(X =x) = ptx) = ("7 Tp" - py" semi rts Pte 2, oon 


This is the probability of r — 1 successes in x — 1 trials multiplied by the probability of success 
in the xth trial. 


If Y ~ Negative B(r, p), then: 
e Mean of X= E(X) =H=s 


iL = 
e Variance of X = Var(X) = 07 = ee) 
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‘ Objectives | 


After completing this chapter you should be able to: 


e Use hypothesis tests to test for the mean J of a Poisson 


distribution 


e Find critical regions of a Poisson distribution using 


tables — pages 62-66 


e Use hypothesis tests to test for the parameter pina 
geometric distribution 


e Find critical regions of a geometric distribution 


— pages 69-72 


_ 


Prior knowledge check _ 
1 The random variable X ~ Po(4). Find: 
a P(X =5) b P(X <3) 


— pages 59-62 


— pages 66-69 


The geometric distribution can be used to 
model the time elapsed between weather 
events. A hypothesis test for the parameter 
of a geometric distribution can help 
determine whether an observed weather 
event is statistically significant. 

— Mixed exercise, Q11 


a * NH, ‘2 - 
ae Se = , ‘ 


CaP (Che) GE CX) 


The probability that a component 
manufactured by a factory is defective is 
known to be 0.0037. The random variable 
X represents the number of components 
manufactured up to and including the 
first defective component. Find: 


a P(X S 30) b P(X > 50) 
c E(X) € Sections 3.1,3.2 § 
A single observation is taken from the 

random variable X ~ B(25, p) and is used 


to test Ho: p = 0.2 against H,: p > 0.2 at 
the 10% level of significance. 


< Chapter 2 


Find the critical region for this test. 
€ Statistics and Mechanics Year 1, Chapter 7 


Hypothesis testing 


4.1) Testing for the mean of a Poisson distribution 


A Poisson distribution can be used to model ThiciPolecon distribution PACnae cana 
situations when events occur at random, but at Saracen acer ete 


a constant average rate. 
= To carry out a hypothesis test for the mean of a Poisson distribution, you form two hypotheses. 
e The null hypothesis, H,: 4 =m is the value of the mean that you assume to be true. 


e The alternative hypothesis, H,, tells you about the value of the mean if your assumption 
is shown to be wrong. 


Testing for the mean allows you to answer questions such as: 

e Does the servicing of a machine decrease the rate at which it produces defective items? 

e Does the introduction of a pelican crossing reduce the number of accidents along a particular 
stretch of road? 


You will need to find values from the cumulative Poisson distribution to carry out hypothesis tests. 
You can use your calculator, or the table given in the tables on page 191. 


Example 


Accidents used to occur at a certain road junction at the rate of 6 per month. The residents 
petitioned for traffic lights. In the month after the lights were installed there was only one accident. 
Test, at the 5% level of significance, whether there is evidence that the lights have reduced the rate 
of accidents. 


Let the random variable X = the number of 


accidents in a month + 
Ho A=G Hy: A < 6 
Assume Ho, so that X ~ Po(6). 


Significance level 5% 
P(X < 1) = 0.0174 + 
0.0174 < 0.05 


Therefore there is sufficient evidence at 


the 5% level to reject Ho and conclude that 


lights have reduced the number of accidents. 


Example 


Over a long period of time, Fatima found that the bus taking her to school was late at a rate 

of 2.5 times per month. In the month following the start of the new summer bus schedules, 
Fatima finds that her bus is late 6 times. Assuming that the number of times the bus is late each 
month has a Poisson distribution, test at the 2% level of significance, whether or not the new 
schedules changed the frequency with which the bus is late. 
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Let the random variable X = the number of 


times the bus is late in a month + 


Assume Ho, so that X ~ Po(2.5). 


Significance level 2%, so significance level in = 


each tail is 1%. 
Problem-solving 
PX = 6)=1-P(X¥ = 5) =1-0.9580 


= 0.0420 The mean is 2.5. The observed value, 6, is greater 
than the mean so it lies in the top half of the 
distribution. You need to find P(Y = 6) and 
compare this answer with 0.01. 


0.0420 > 0.01 


There is insufficient evidence at the 2% 
level to reject Ho, so conclude that the new 


schedules have not changed the frequency 
with which the bus is late. 


You might have to use a Poisson distribution as an approximation to a binomial distribution when 
carrying out a hypothesis test. 


Example 


During an influenza epidemic, 4% of a population of a large city were affected on a given 

day. The manager of a factory that employs 250 people found that 17 of the employees at his 
factory were absent, claiming to be suffering from influenza. Using a Poisson approximation to 
the binomial distribution and a 5% level of significance, test whether or not the proportion of 
employees suffering from influenza at his factory was larger than that of the whole city. 


Let the random variable XY = the number 
of employees out of 250 suffering from 
influenza 


Ho: p=0.04 Hy: p > 0.04- 

Assume Ho, so that X ~ B(250, 0.04). 

Under a Poisson approximation, 

X ~ Po(250 x 0.04), so X ~ Po(1O)+ 

Significance level 5% 

P(X 217) =1-P(X S 16) = 1 - 0.9730+ 
= 0.0270 


00270 <= OOS * 
There is sufficient evidence to reject Ho, and 
conclude that the proportion of employees 
at his factory having influenza is greater than 


that of the whole city. 
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Exercise 4A) 


1 


A single observation is taken from a Poisson distribution Po(A) and a value of 3 is obtained. 
Use this observation to test Hp: J = 8 against H,: 1 < 8, using a 5% level of significance. 


A random variable X has a Poisson distribution Po(A). A single observation of x = 2 is taken 
from the distribution. Test at the 5% level of significance, Hp: A = 6.5 against H,: 4 < 6.5. 


A single observation is taken from a Poisson distribution Po(A) and a value of 8 is obtained. 
Use this observation to test Hp: 2 = 5.5 against H,: 1 > 5.5. using a 5% level of significance. 


A random variable X has a Poisson distribution Hint ) Thisigatworsiledtes 102 55 
Po(A). A single observation of x = 10 is taken 
from the distribution. Test at the 5% level of 
significance, Hy: A = 5.5 against H,: 1 # 5.5. 


so calculate P(Y = 10) and compare the 
answer with 0.025. 


The number of misprints on each page of the Daily Moaner is found to have a Poisson 
distribution with mean 7.5. Soon after a new proofreader is employed, the editor finds one day 
that there are 13 misprints on a particular page, and claims that the mean number of misprints 
has increased. Test this claim at the 5% level of significance. 


On a stretch of road, accidents occur at a rate of 0.8 per month. In the month following new 
markings on the road, it is found there are 3 accidents. Is there any evidence at the 5% level of 
significance to suggest an increase in the rate of accidents along this stretch of road? 


A restaurant has a coffee machine which Problem-solving 


seizes up and stops working at a rate of . Your random variable should be the number of 
0.2 times per week. In the 5 weeks following times the machine seizes up in 5 weeks, so the 


the introduction of a new brand of coffee, Seg Ba 

the machine seizes up 3 times. Is there any 

evidence, at the 5% level of significance, to suggest that the rate at which the coffee machine 
seizes up has increased? 


The number of houses sold per week by a firm of estate agents follows a Poisson distribution 
with a mean of 2.25. The firm appoints a new salesman. In the four-week period following the 
appointment, the number of sales is 6. Test, at the 5% level of significance, whether or not there 
is evidence to suggest the rate of sales has changed. 


The number of accidents each week at a crossroads controlled by traffic lights may be modelled 
by a Poisson distribution with mean 1.25. The timings on the lights are changed and in the next 
6 weeks there is a total of 4 accidents. Is there any evidence, at a 5% level of significance, of a 
reduction in the mean weekly number of accidents? 


The average number of flaws per 50m of cloth produced by a machine is found to be 2.3. 
After the machine is serviced, the number of flaws in the first 150m is found to be 3. 
Test, at the 5% level of significance, whether or not the average number of flaws has changed. 


The number of breakdowns per day in a large fleet of hire cars has a Poisson distribution with 
mean 0).3. 

Find the probability that in a 20-day period the number of breakdowns is: 

a exactly 5 (2 marks) 
b no more than 8. (2 marks) 
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The hire car company introduces new servicing guidelines in order to try to decrease 

the number of cars that break down. In a randomly chosen 30-day period following the 

introduction of the new measures, 5 cars break down. 

c Test, at the 5% level of significance, whether or not the mean number of breakdowns has 
decreased. State your hypotheses clearly. (4 marks) 


12 A doctor expects to see, on average, 2.25 patients per week with a particular condition. 
The doctor decides to send information to her patients to try and reduce the number of 
patients she sees with the condition. In the first four weeks after the information is sent, she 
sees 4 patients with the condition. Test at the 5% level of significance, whether or not there is 
reason to believe that sending the information has reduced the number of times the doctor 
sees patients with the condition. (5 marks) 


13. Breakdowns occur on a particular machine at a rate of 1.5 every week. A manager feels that 
the rate of breakdowns has changed and decides to monitor the machine. Over a 6-week period 
she finds that there are 13 breakdowns. Test at the 5% level of significance, whether or not the 
manager’s suspicion is correct. (5 marks) 


14 A factory produces components, of which 1% are defective. The components are packed in 
boxes of 1000. 


a Using a Poisson approximation to the binomial distribution, estimate the probability that a 
randomly chosen box contains: 


i exactly 9 defective components (2 marks) 
ii no more than 7 defective components. (2 marks) 
b Explain why this approximation is suitable. (2 marks) 


The machinery in the factory is serviced and it is found that in the first box produced following 
the servicing there are 5 defective components. 


c Is there evidence to suggest at the 5% level of significance that the servicing has reduced the 
number of defective components? State your hypotheses clearly. (4 marks) 


4.2 | Finding critical regions for a Poisson distribution 


The critical region is the range of values of the test statistic that would lead to you rejecting Ho. 
The value(s) on the boundary of the critical region are called critical value(s). 


You need to be able to find a critical region for a one- or two-tailed test of the mean of a Poisson 


distribution. 

= A one-tailed test has an alternative hypothesis H, : 0 < mor Notation } 0 is the 
H,: 06>. There is a single part to the critical region and one parameter you are 
critical value. testing for. In this case, it 


would be the mean of the 


= A two-tailed test has an alternative hypothesis H,: 0 4 m. 
Poisson distribution, 4. 


There are two parts to the critical region and two critical values. 


Example 


An estate agent has been selling houses at a rate of 9 per month. He believes that the rate of sales 

will decrease in the next month. 

a Using a 5% level of significance, find the critical region for a one-tailed test of the hypothesis 
that the rate of sales will decrease from 9 per month. 

b Write down the actual significance level of the test in part a. 
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a Let the random variable X = the number of 
house sales in a month . 


fat] oO: New 2 
Assume Ho, so that X ~ Po(9). 
Significance level 5% + 
Require P(X < c) < 0.05 


From tables 


P(X < 3) = 0.0212 and P(Y < 4) = 0.0550 _ 
It is often easier to find critical values 


P(X = 3) < 0.05: aia PX = 4) > C105 : : 
ae a ae en by looking at tables than by using your calculator. 


Hence the critical region is X S 3. 


b Actual significance level = P(X S 3) 
=O0.0212« 


0) 


’ Online ) Explore critical regions for a 


Poisson distribution using GeoGebra. 


Example 


Leonora Metti is regarded as a super striker who plays for Statistics All Stars. The mean number of 
goals she has scored over the several years she has been with the club is 0.7 goals per game. She has 

now been transferred to Mechanics Ladies. The management at Mechanics Ladies wish to see if the 
rate at which she scores goals has now increased. They monitor the number of goals that she scores 

in her first 10 games. 


Assuming a Poisson distribution, 


a use a 5% level of significance to find the critical region for a one-tailed test of the hypothesis that 
Leonora has increased her rate of scoring from 0.7 goals per game 


b write down the actual significance level of the test in part a. 
Leonora scores 11 goals in her first 10 games for Mechanics Ladies. 


c Comment on this observation in light of your critical region. 


a Let the random variable X = the number of 
goals scored by Leonora Metti in 10 games 
Ho A= O.7 Fy A > 07 Problem-solving 
Assume Ho, so that X ~ Po(7). 
Significance level 5% 
Require P(X 2 c) < 0.05 

From tables 

P(X = 12) =1- P(X S 11) = 1 - 0.9467 

=0,0533: 

P(X = 13) =1- P(X S 12) = 1 - 0.9730 

=0.0270 


P(Y = 12) > 0.05 and P(X = 13) < 0.05 
so the critical value is 13. 


The assumed rate is 0.7 goals per game, so the 
mean for the number of goals scored in 10 games 
will be 0.7 x 10=7 


Hence the critical region is X = 13. 
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b Actual significance level = P(rejecting Ho) 
= P(X 2 13) = 0.0270 


c X= 11 does not lie in the critical region, so 
there is insufficient evidence to reject Ho. 
Conclude that Leonora has not increased her 
goal-scoring rate. 


| 


Example 


An office finds that, over a long period of time, incoming calls from customers occur at a rate of 
0.325 per minute. 


They believe that the rate of calls has changed recently. To test this, the number of incoming calls 
during a random 20-minute interval is recorded. 


a Find the critical region for a two-tailed test of this hypothesis. The probability in each tail should 
be as close to 2.5% as possible. 


b Write down the actual significance level of the test. 
The actual number of calls recorded in the 20-minute period was 13. 
c Comment on this observation in light of your critical region. 


a Let the random variable X = the number of incoming calls in a 
20-minute interval 


Assumed rate is 0.325 per minute, so in 20 minutes you would 
expect 20 x 0.325 = 6.5 calls, hence 


Hp A=6.5 Hy: AFE5 & 
Assume Ho, so that X ~ Po(6.5). 


If X = c, is the upper boundary of the lower critical region, we 
require P(X < c,) to be as close as possible to 2.5%. 


From tables 

P(X S 2) = 0.0430 and P(X S 1) = 0.0113 + 
0.0113 is closer to 0.025, so c, = 1, hence lower critical region 
is X $1. 

If X = cz is the lower boundary of the upper critical region, we 
require P(X = cz) to be as close as possible to 2.5%. 


From tables t Watch out ) The probability 


PX = 12) =1-P(X¥ S 11) =1- 0.9661 = 0.0339 in each tail has to be as 

and P(X = 13) =1- P(X S 12) = 1 - 0.9840 = 0.0160 close to 2.5% as possible, 
Since 0.0339 is closer to 0.025, we choose ¢z to be 12, but it doesn’t necessarily 
hence upper critical region is X = 12. have to be less than 2.5%. 


Hence critical region is X <1 or X 2 12. 
b Actual significance level = P(X S 1) + P (X 2 12) 
= 0.0113 + 0.0339 = 0.0452 


ce X= 13 is in the critical region so reject Ho. Conclude that there is 
evidence to suggest that the rate of incoming calls has changed. 
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Hypothesis testing 


Exercise 4B) 


1 


A single observation is to be taken from a Poisson distribution with parameter 4. This value is 
used to test Hy against H;. In each question part, find the critical region for the test, and write 
down the actual significance level of the test. 


a Ho: 4= 5.5; Hy: 4 < 5.5 using a 5% level of significance 
b Ho: 4 = 8; H,: 4 > 8 using a 1% level of significance 
c Ho: 4=4; H,: 4 > 4 using a 5% level of significance 


A fisherman is known to catch fish at a mean rate of 5 per hour. The number of fish caught 
by the fisherman in an hour follows a Poisson distribution. The fisherman buys some new 
equipment and wants to test whether or not there is an increase in the mean number of fish 
caught per hour. He records the number of fish he catches in a two-hour period. 


Using a 5% level of significance, find the critical region for this test. 


The number of sales made by Hans, a telephone sales person, averages 0.8 per day. 


Hans is given some extra training and his total sales over a period of 10 days is noted. Find 
the critical region of a test at the 5% level of significance to determine whether the daily sales 
achieved by Hans have increased. 


In the manufacture of cloth in a factory, defects occur randomly in the production process at 
a rate of 1.3 per 5m’. The factory introduces a new procedure to manufacture the cloth. After 
the introduction of this new procedure, the manager takes a random sample of 25m? of cloth 
from the next batch produced to test if there has been any decrease in the rate of defects. 


Using a 5% level of significance, find the critical region for this test. 


Accidents occur randomly at a crossroads at a rate of 0.5 per month. A new system is 
introduced at the crossroads. The number of accidents in the next 12 months is recorded. 

Find the critical region of a test at the 5% level of significance to determine whether the rate of 
accidents has decreased. 


An online shop sells a computer game at an average rate of 0.35 per day. In an attempt to 
increase sales of the computer game, the price is reduced for 20 days. Find the critical region of 
a test at the 5% level of significance to determine whether the rate of sales has increased. 


A single observation is to be taken from a Poisson distribution with parameter 1. This value is 
used to test Hy against H,. Using a 5% significance level, find the critical region for this test. 
The probability of rejection in either tail should be as close as possible to 2.5%. Write down the 
actual significance level of each test. 


a Hy: 4=4; 0,144 b Hy: 4=8; Hy: 148 ec Hy: 4=9.5; H,: 449.5 
During term time, incoming calls to a school are thought to occur at a rate of 0.25 per minute. 
To test this, the number of calls during a random 30-minute interval is recorded. 


a Find the critical region for a two-tailed test of the hypothesis that the number of incoming 
calls occurs at a rate other than 0.25 per minute. The probability in each tail should be as 
close to 2.5% as possible. (3 marks) 
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b Find the actual significance level of the above test. (1 mark) 


The actual number of calls recorded in this 30-minute period was 11. 
c Comment on this observation in light of the critical region. (2 marks) 


Millie manufactures printed material. She knows that defects occur randomly in the 
manufacturing process at a rate of 1 every 7 metres. Once a week the machinery is cleaned and 
reset. Millie then takes a random sample of 35 metres of material from the next batch produced 
to test if there has been any change in the rate of defects. 


a Stating your hypotheses clearly and using a 10% level of significance, find the critical region 
for this test. You should choose your critical region so that the probability of rejection is less 
than 0.05 in each tail. (3 marks) 


b State the actual significance level of this test. (1 mark) 


A company claims that it receives emails at a mean rate of 3 every 5 minutes. 


a Give two conditions under which a Poisson distribution would be a suitable model for the 
number of emails received. (2 marks) 


To test its claim, the company records the number of emails received in a 15-minute period. 


b Using a 5% level of significance, find the critical region for a two-tailed test of the hypothesis 
that the mean number of emails received in a 15-minute period is different from 9. The 
probability of rejection in each tail should be as close as possible to 0.025. (3 marks) 


c Find the actual level of significance of this test. (1 mark) 
The actual number of emails received in this period was 13. 
d Comment on the company’s claim in the light of this value. Justify your answer. (2 marks) 


A single observation x is to be taken from a Poisson distribution with parameter J. 


This observation is to be used to test, at a 5% level of significance, Hp: A = c against H,: A # c, 
where c is a positive integer. The probability in each tail is less than 0.025. 


Given that the critical region for this test is ¥ < 2 or ¥ = 15, 
a find the value of c, justifying your answer. (3 marks) 
b Find the actual significance level of this test. (2 marks) 


@® Hypothesis testing for the parameter p of a geometric distribution 


Th 


the number of trials until a single successful trial 


e geometric distribution can be used to model Fora Seomen ealstributlon tobe valid) 


the probability of success must be constant on 


pachicves. each trial and each trial must be independent. 

= To carry out a hypothesis test for the The parameter p used in a geometric distribution 
parameter p of geometric distribution, is the probability of success on each trial. 
you form two hypotheses. If X ~ Geo(p) then the mean of Kis 5 
e The null hypothesis, H,: p = mis the € Sections 3.1, 3.2 
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value of p that you assume to be true. 


e The alternative hypothesis, H,, tells you about the value of p if your assumption is shown 
to be wrong. 


Hypothesis testing 


Testing for p allows you to answer questions such as: 

e |sadice biased because it takes me lots of rolls to get a 6? 

e Does the introduction of a new drug reduce the probability of suffering a particular symptom of my 
condition on any one day? 


Unlike the parameters p of a binomial distribution and A of a Poisson distribution, when the 
parameter p of a geometric distribution increases the mean of the distribution decreases. So when you 
set up a hypothesis test to see if the parameter p of a geometric distribution has increased you need 
to look to see if the test statistic falls in the lower end of the distribution and vice versa. 


The following results are particularly useful when carrying out a hypothesis test for the mean of a 
geometric distribution: 


= If X ~ Geo(p) 
e P(X =x) =p(1-p)*-1 e P(X <x)=1-(1-p)* e P(X> x) =(1-p)*-? 


Example 


A company claims that | in 8 of their packets of crisps contains a prize ticket. Lee decides to test 
this claim and he buys a packet of crisps each day until he finds a prize ticket. Lee finds a prize ticket 
for the first time on the 24th day. Test, at the 5% significance level, whether there is any evidence to 
suggest that the company is overstating the proportion of packets containing a prize ticket. 


of packets of crisps opened until finding 


ee ae 
Peg eis t Watch out ) As p decreases, the mean of Geo(p) 


Assume Ho, so that X ~ Geo(5). increases. To test for p < # you need to consider the 
Significance level 5% probability that X is greater than or equal to the 


P(Y = 24) = (1 _ 18 = (z)°° : observed value. 


= 0.0464 (4 dp.) 


0.0464 < 0.05- 


There is sufficient evidence to reject 


Ho, and conclude that the company is 
overstating the proportion of packets 


containing a prize ticket. 


An electronics company makes small components for use in computers. It claims that the 
percentage of defective components coming off the production line is 0.05%. The electronics 
company sells the components to a retailer. The retailer suspects that the percentage defective 
stated by the electronics company should be higher. From a very large consignment recently 
purchased, the retailer tests the components until he finds a defective component. He finds a 
defective component on the 90th component that he tests. Is there any evidence to suggest that the 
retailer’s suspicions are correct? Test at the 5% level of significance. 
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Let the random variable X = the number of 


components tested until finding a defective 
Ho: p = 0.0005 H;: p > 0.0005 
Assume Ho, so that X ~ Geo(0.0005). -— 
Significance level 5% 
P(X = 90) = 1 - (1 - 0.0005)°° . 
== O.9995"° 
= 0.0440 (4 d.p) 


0.0440 < 0.05. 


There is sufficient evidence to reject Ho 
and conclude that the retailer's suspicions 


are correct i.e the percentage defective is * 
greater than 0.05%. 


Exercise 40) 


1 A single observation is taken from a geometric distribution Geo(p) and a value of 9 is obtained. 
Use this observation to test Ho: p = 0.25 against H,: p < 0.25, using a 5% level of significance. 


2 A random variable X has a geometric distribution Geo(p). A single observation of X = 6 is 
taken from the distribution. Test at the 5% level of significance, Hy: p = 0.6 against 
Hy: p < 0.6. 


3 A single observation is taken from a geometric distribution Geo(p) and a value of 3 is 
obtained. Use this observation to test Ho: p = 0.01 against H,: p > 0.01 using a 5% level of 
significance. 


4 A random variable has distribution ¥ ~ Geo(p). A single observation of XY = 18 is obtained. 
Use this observation to test Ho: p = 0.15 against H,: p < 0.15 using a 5% level of significance. 


5 A random variable has distribution Y ~ Geo(p). A single observation of X = 2 is obtained. 
Test, at the 5% level of significance, Hy: p = 0.02 against H;: p > 0.02. 


6 A dice used in a board game is suspected of not giving the number 6 often enough. A player 
throws the dice and finds that she gets her first 6 on her 20th throw. Does this give significant 
evidence at the 5% level of significance, that the probability of getting a 6 is less than 2? 


7 Itis claimed that a computer program produces at random a letter from the list A, B, C, D, E. 
It is found that the first A occurs as the 15th letter after the computer is set running. Is there 
any evidence to suggest at the 5% level of significance, that the probability of getting an A is 
less than 2? 
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Hypothesis testing 


On average Lucy scores a goal in one of every 4 attempts from a free kick. 
a State a suitable distribution to model the number of attempts needed to score her first 

goal. (2 marks) 
b Find the probability that she scores her first goal on her 5th attempt. (2 marks) 


After an injury, Lucy scores her first goal from a free kick on her 10th attempt. 
c Test, at the 5% level of significance, whether the probability of her scoring a goal from a free 
kick is now less than + (3 marks) 


It is claimed that | in 4 scratch cards is a winner. A statistics student decides to test this claim 
because she suspects the probability is less than this. She buys one scratch card every day and 
finds that she gets her first win on the 12th day. Use a 5% level of significance to test whether 
the student’s suspicion is valid. 


It is claimed by Wisetalk that 22% of the population own one of their phones. People are 
selected one at a time, and asked if they own a Wisetalk phone. The number of people 
questioned, up to and including the first person to own a Wisetalk phone, was found to be 14. Is 
there any evidence at the 5% level of significance that Wisetalk are overstating the percentage? 


Marie claims that she scores a penalty on 30% of her attempts. One of her rivals claims that she is 
overstating her ability. In an attempt to prove her case, Marie takes consecutive shots until she 
scores her first penalty. She scores her first penalty on her 10th shot. Test her rival’s claim, using a 
5% level of significance and clearly stating your null and alternative hypotheses. (4 marks) 


Imelda is a bird watcher. The probability that she will see a robin on any given day is Z. 
The random variable X represents the number of days until Imelda first sees a robin. 
a Write down a suitable distribution to model X, and give two conditions that are necessary for 


this model to be valid. (3 marks) 
b Calculate the probability that Imelda sees her first robin 

i on the third day (2 marks) 

ii after the fourth day. (2 marks) 


Imelda further claims that the probability that she will see a magpie on any given day is i. It is 

decided to test Imelda’s claim. It is found from her records that it took 12 days until she saw her 

first magpie. 

c Is there any evidence at the 5% level of significance that Imelda is overstating the probability 
of seeing a magpie on any given day? State your hypotheses clearly. (4 marks) 


4.4 | Finding critical regions for a geometric distribution 


You need to be able to find a critical region for a hypothesis test of the parameter of a geometric 
distribution. 


Ayesha wishes to test if a five-sided spinner numbered from | to 5 is fair. She spins the spinner and 
counts the number of trials until she gets the spinner to show the number 1. 


a 


b 


Using a 5% level of significance, find the critical region for a one-tailed test of the hypothesis 
that the probability of getting the number | on a single spin is less than z. 
Write down the actual significance level of the test in part a. 
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a Let the random variable X = the number of 
spins until getting the number 1 


H:ip<e 


: 


harps 4 
Assume Ho, so that X ~ Geo(Z). 
Significance level 5% 
Require P(X = c) < 0.05> 
So (1-4)! < 0.05 + 


- 


(0.8)°-' < 0.05 
(¢ = 1) log 0.8 < log 0.05 ——____, | 
log 0.05 
_ log 0.8 
c-1> 13.425... + 
c> 14425... 


Hence critical region is X 2 15. + 


b Actual significance level = P(X = 15). 
=(1- 24)" = 08)" = 0.0440 


Define the test statistic. 


State your hypotheses. 
5% is the nominal significance level for the test. 
You need to find a critical value, c, such that the 


probability of needing c spins or more is less than 5%. 


For a geometric distribution, ¥ ~ Geo(p), 
PROCS 39) = (Ll pyr? 


Take logs of both sides. 


t Watch out | log 0.8 is negative so reverse the 


direction of the inequality when dividing by log0.8. 


You could also use c — 1 > logy, 0.05, but be 
careful with the direction of the inequality. 


c must be an integer. 


t Online ) Explore critical regions for a 


geometric distribution using GeoGebra. 


In a particular city, a Lobster Card is used as a method of payment on trains. The company that 
administers the card claims that only | in 1000 cards will be rejected by the card reader at the train 
station. A station manager feels that the company is understating the proportion of cards rejected 
by the card reader, and decides to carry out a test. When he comes on shift at 5:00am he counts the 
number of passengers who pass through until he notes a passenger who has a Lobster Card rejected. 


a Using a 5% level of significance, find the critical region for a one-tailed test of the hypothesis 
that the proportion of Lobster Cards rejected by the card reader is greater than | in 1000. 


b Write down the actual significance level of the test in part a. 


a Let the random variable X = the number 
of cards read by the reader until one is 
rejected 


Ho: p= 0.001 H;: p > 0.001 

Assume Ho, so that X ~ Geo(O.001). 

Significance level 5% 
Require P(X < c) < 0.05 + 


So 1 - (1 — 0.001)°< 0.05 + 
1=O999" = 0105 
ODIO" > 0.95 


¢ 10g0.999 > log0.95 + 
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Define the test statistic and state your hypotheses. 


cis the critical value. You are testing p > 0.001 so 
you need to consider P(Y < c). 


For a geometric distribution, ¥ ~ Geo(p), 
P(Y <x) =1-(1-p)* 


Take logs of both sides. 


@ 


Hypothesis testing 


b Actual significance level = P(X S 51) 


logO.95 
logO.999 
C= SZC 
Hence critical region is X S 51. + 


c= 


= 1- 0.999"! = 0.0497 (4 dp) 


Exercise 4D) 


1 A random variable has distribution Y ~ Geo(p). A single observation is used to test Ho: p = 0.3 


against H,: p < 0.3. 
a Using a 5% level of significance, find the critical region for this test. 
b Calculate the actual significance level of this test. 


A random variable has distribution Y ~ Geo(p). A single observation is used to test Hy: p = 0.35 
against H,: p < 0.35. 

a Using a 5% level of significance, find the critical region for this test. 

b Calculate the actual significance level of this test. 


A random variable has distribution Y ~ Geo(p). A single observation is used to test Hy: p = 0.05 
against H,: p > 0.05. 

a Using a 10% level of significance, find the critical region for this test. 

b Calculate the actual significance level of this test. 


Each day Arun enters a ballot for a concert ticket. It is claimed by the concert organisers that 
the probability of winning a ticket each day is 0.23. Arun decides to test this claim. 


a Find the critical region, at the 5% level of significance, for the number of days that Arun has 
to wait before winning a ticket in order for him to claim that the organisers are overstating 
the probability of winning a ticket. (6 marks) 


b Find the probability of incorrectly rejecting the null hypothesis in this test. (2 marks) 
Arun waits 11 days before winning a ticket. 
c Comment on this in light of your critical region. (2 marks) 


Dot is a professional darts player. She claims that the probability that she will hit the bullseye 

with a single dart is ‘. Her arch rival Sharon claims that Dot is exaggerating her ability to hit 

the bullseye. To test this claim, Dot throws darts until she hits a bullseye. 

a Find the critical region, at the 5% level of significance, for the number of darts thrown by 
Dot in order to accept Sharon’s claim. (6 marks) 


b Find the actual significance level for this test. (2 marks) 


Rita has a medical condition, a symptom of which is to have a tremor. The probability that 
Rita has a tremor on any given day is 0.6. A new drug to help treat Rita’s condition becomes 
available. Rita wants to test, at the 5% significance level, whether the new drug has reduced the 
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probability that she will have a tremor on any given day. After allowing a period of adjustment, 
Rita is observed to see how many days it will be before she has a tremor. 


Find the critical region for this test. (6 marks) 


Challenge 


A single observation is to be taken from a geometric distribution ¥ ~ Geo(p). This observation is 
used to test Ho: p = 0.009 against H,: p # 0.009. 


a Using a 5% level of significance, find the critical regions for this test. The probability of 


rejecting either tail should be as close as possible to 2.5%. 


b Find the probability of incorrectly rejecting the null hypothesis on this test. 


The actual value of X obtained is 5. 


c Based on this observation state, with reasons, whether there is sufficient evidence to reject Ho. 


Mixed exercise s) 


1 
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Vehicles pass a particular point on a road at a rate of 39 vehicles per hour. 

Find the probability that in any randomly selected 10-minute interval 

a exactly 6 cars pass this point (2 marks) 
b at least 8 cars pass this point. (2 marks) 


After the introduction of a new one-way system, it is suggested that the number of vehicles 
passing this point has decreased. 


During a randomly selected 10-minute interval 2 vehicles pass the point. 


c Test, at the 5% level of significance, whether or not there is evidence to support the suggestion 
that the number of vehicles has decreased. State your hypotheses clearly. (4 marks) 


An effect of a certain disease is that a small number of the red blood cells are deformed. 
Francesca has this disease and the deformed blood cells occur randomly at a rate of 3.2 per ml 
of her blood. Following a course of new treatment, a random sample of 2.5 ml of Francesca’s 
blood is found to contain only 4 deformed red blood cells. 


Stating your hypotheses clearly and using a 5% level of significance, test whether or not there 
has been a decrease in the number of deformed red blood cells in Francesca’s blood. (4 marks) 


The probability that Peter completes the crossword successfully each day in his daily 

newspaper is 5. 

A new crossword setter has been appointed. It takes Peter 7 days until he completes his first 
crossword successfully. Is there evidence to suggest that the crosswords are now more difficult? 
Test at the 5% level of significance, stating your hypotheses clearly. (4 marks) 


During the winter in the ski resort of Glen Hoe, the probability that snow falls on any one day 
is 0.45. Roisin starts her winter break in Glen Hoe on 1st December. 


a Calculate the probability that the first fall of snow that Roisin sees is on or after 
3rd December. (2 marks) 


Hypothesis testing 


A meteorologist feels that due to changes in climate, the probability of seeing a fall of snow on 
any day in winter in Glen Hoe is now less than 0.45. Roisin eventually sees her first fall of snow 
on December 7th. 


b Test, at the 5% significance level, whether there is sufficient evidence to suggest that the 
meteorologist is correct. (4 marks) 


Scoobie is the receptionist at a large company. Records show that over the many years that 

Scoobie has worked for the company the probability of his connecting to the wrong extension 

is 0.03. Find, using a Poisson approximation to the binomial distribution, the probability that 

in a day when Scoobie receives 150 calls he puts through: 

a 5calls to the wrong extension (3 marks) 

b no more than 3 calls to the wrong extension. (3 marks) 

Scoobie retires and a new receptionist, Waldo is appointed. The company monitors his first 300 

calls and finds that he puts through 4 calls to the wrong extension. 

c Test, using a Poisson approximation to the binomial distribution, whether there is any 
evidence to suggest that Waldo has decreased the rate at which calls are put through to the 
wrong extension. Test at the 5% level of significance. (3 marks) 


Arnold is a printer. Breakdowns on his printing press occur at an average rate of 1.75 per 
month. Assuming a Poisson distribution, find the probability that: 
a exactly 3 breakdowns occur in a particular month (2 marks) 
b more than 5 breakdowns occur in a two-month period (3 marks) 
c in four consecutive months there are two months in which there are exactly 3 

breakdowns. (3 marks) 
Arnold has his printing press serviced and wants to test whether the rate of breakdowns has 
been reduced. He records the number of breakdowns in the next four months. 
d Find the critical region, at the 5% significance level, for Arnold’s test (3 marks) 
e State the actual significance level of Arnold’s test. (1 mark) 


An electrical goods retailer sells a mean of 3.5 television sets per day on a weekday. 

The retailer decides to do some advertising in a local paper. In a two-day period following 

the advertising, the retailer sells 11 television sets, leading him to believe that the advert has 
increased his sales. Stating your hypotheses clearly, test at the 5% level of significance whether or 
not there is evidence of an increase in sales following the appearance of the advert. (4 marks) 


A company’s website is visited on weekdays, at a rate of 8.5 visits per minute. In a random one 

minute on a Saturday the website is visited 12 times. 

a Test, at the 5% level of significance, whether or not there is evidence that the rate of visits is 
greater on a Saturday than on a weekday. (3 marks) 

b State the minimum number of visits that would be required in a given minute to obtain a 
significant result. (2 marks) 


A manager thinks that 5% of her workforce are absent for at least one day each month. She chooses 
200 workers at random and finds that in the last month 15 workers had been absent for at least 
one day. 

Using a Poisson approximation to the binomial distribution, test at the 5% level of significance 
whether the percentage of workers who are absent for at least one day each month is higher 
than the manager thinks. (4 marks) 
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It is known that 15% of products produced by a machine are defective. Products are tested, one 
at a time, until the first defective one is encountered. 


The random variable X represents the number of products tested until the first defective one is 
found. Find: 


a P(X=5S) (2 marks) 
b P(Y= 3) (2 marks) 


The machine is serviced and it is hoped that this has reduced the proportion of defective 
products. 


c Find the critical region for a hypothesis test that the proportion of defective products has 
reduced. Use a 5% level of significance. (6 marks) 


d Find the actual significance level of this test. (2 marks) 


Over a number of years, the mean number of hurricanes experienced in a certain area during 
the month of August is 4. A scientist suggests that, due to global warming, the number of 
hurricanes will have increased, and proposes to do a hypothesis test based on the number of 
hurricanes this year. 


a Suggest suitable hypotheses for this test. (2 marks) 


b Find to what level the number of hurricanes must increase for the null hypothesis to be 
rejected at the 5% level of significance. (3 marks) 


The actual number of hurricanes this year was 8. 
c Comment on this observation in light of your answer to part b. (2 marks) 


A coin is believed to be biased. Alison and Paul want to test the coin to see if the probability of 
it landing on heads, p, is significantly less than s. They both use a 2% significance level. 


Alison spins the coin 30 times and records the number of heads. 
a Find the critical region for Alison’s test. (2 marks) 
Paul spins the coin until it lands on heads for the first time. 
b Find the critical region for Paul’s test. (6 marks) 
Alison and Paul both observe values that lie within their respective critical regions. As a result, 
they reject the assumption that p = s. 
c Find the probability that Paul and Alison have incorrectly rejected the assumption that 

p= . (5 marks) 


Challenge 


An oil company found that in a certain region there is an 18% chance of 
striking oil when a well is drilled. The company has now started drilling in a 
neighbouring region and wishes to test if there is greater chance in this region 
of striking oil when a well is sunk. It decides to count the number of wells, N, 
sunk until it strikes oil for the third time. 


a Suggest a suitable distribution for N, stating any assumptions that are 


necessary. 


b Using a 5% level of significance find the critical region for this test. 


c 
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Write down the actual significance level of the test. 


Hypothesis testing 


Summary of key points 


1 To carry out a hypothesis test for a given parameter, 0, you form two hypotheses. 
- The null hypothesis, H, : 6 = 7 is the value of the parameter that you assume to be true. 
- The alternative hypothesis, H,, tells you about the value of the parameter if your 
assumption is shown to be wrong. 


2 - Aone-tailed test has an alternative hypothesis H, : 6 <morH,:6>m. There is a single part 
to the critical region and one critical value. 
- A two-tailed test has an alternative hypothesis H, : 6 # m. There are two parts to the critical 
region and two critical values. 


3 The actual significance level of a test is the probability of incorrectly rejecting Ho. 
4 lf X ~ Geo(p) 

+ P(X= x) = p(l —p)-* 

* P(X sx) =1-(1-p)* 

oP(X>x)=(l=p)- 
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Central limit 
theorem 


After completing this chapter you should be able to: 
e@ Understand and apply the central limit theorem to 
approximate the sample mean of a random variable, X 
— pages 59-62 


@ Apply the central limit theorem to other distributions 
— pages 62-64 


-_- : 
ST, BARNABAS PARISH HALL 


Prior knowledge check 


1 A random variable XY ~ N(120, 82). Find: 
a P(X > 115) b P(120 < X < 130) 


c asuch that P(Y <a) =0.25 
€ Statistics and Mechanics Year 2, Chapter 3 


B-yay 


A fair six-sided dice is rolled. Let Y be the aap ia 


score on the uppermost face, and let = The central limit theorem gives 
Y= 1-3. Find: = information about the distribution 
a E(Y) b Var(Y) of the sample mean, even when 
c P(Y<-5) € Section 1.3 faa «the distribution of the population is 
@ unknown. Statisticians use it to infer 
3 Robin flips a fair coin until he gets five heads. Si how likely the views of a sample are to 
Find the probability that the coin is flipped at be representative of the population. 


least 12 times. € Section 3.3 —> Mixed exercise, Q11 


Central limit theorem 


CX) The central limit theorem 


If you take a random sample of 7 observations from THereatouecuulrornycotten: 


a normally distributed random variable X ~ N(u, 0°), 


oe testing for the mean of a normal distribution. 
then the sample mean, X, is also normally distributed : 


€ Statistics and Mechanics Year 2, Section 3.7 
ss 2 
with ¥ ~ Nu, =) 


In fact, this result is a special case of a more powerful result called the central limit theorem. 
This states that the mean of a large random sample taken from any random variable is always 
approximately normally distributed. This result is true regardless of the distribution of the original 
random variable. 


= The central limit theorem says that if X,, X2, ..., X,,is arandom sample of size from a 


_ 2 
population with mean y and variance o?, then X is approximately ~ Nu, aa 


Note that in general the sample mean is only t Watch out ) You can see that this is only an 
2 4 é « a 
approximately distributed with N(u, a) Asa approximation by considering n = 1. In this 
; ; i case, each sample is a single observation, 
gets larger, this approximation gets better. so the sample mean will have the same 


distribution as the original random variable. 


The variance of the sample mean also decreases as n gets large. You can say that for a large sample, 
the sample mean will be very close to the population mean. 


A sample of size 9 is taken from a population with distribution N(10, 27). Find the probability that 
the sample mean_¥ is more than 11. 


The population is normal, so X will have a 
normal distribution despite the small size of 
the sample. 
=, _ 0% 22 (2\* 
varth) = == = (5) 
_ 2\2 The mean of X is w (=10) and the variance of X 
So ¥~nf10,(§) | . a 
3 is—. 
7 n 
The mean of X is 10 and the standard 
deviation is $ so: Use the normal distribution function on your 
P(X > 11) =1-P(¥ <1) calculator to find the probability. 
= |= 0193832 
= 0.0668 (4 ap) t Watch out | In this case the distribution of the 


sample mean is not an approximation. This is only 
true when the population is normally distributed. 


a 


Chapter 5 


Example 


A six-sided dice is relabelled so that there are three faces marked 1, two faces marked 3 and one 
face marked 6. The dice is rolled 40 times and the mean of the 40 scores is recorded. 


a Find an approximate distribution for the mean of the scores. 


b Use your approximation to estimate the probability that the mean is greater than 3. 


x 1 3 6 


P(X = x) 2 3 : 


So: p=E(X) = oxP(X = x) 
1 


single roll; then the distribution of X is: 


a Let the random variable X = the score ona 


= 25 

and o@ = Var(X) 
= Sox?P(X = x) — p? 
=?@xt4+32xt+e62xt-(3 
= 2-22=3250r¥ 


Now by the central limit theorem: 


= 1 1 
=1Txzt3xgt+Gxe 


\ 


Problem-solving 


Find the mean and variance of the discrete 
distribution. € Sections 1.1, 1.2 


The population is clearly not normally distributed 
-—— but the sample size (n = 40) is quite large so the 
central limit theorem can be used. 


You can use the normal distribution function on 


X¥ x ~n(25, 3) - 


b P(X > 3)=1-P(X < 3) - 


& your calculator to find P(X > 3). 


= 1- 0.9599 
= 0.0401 (4 dp) 


Exercise A) 


t Watch out | You do not need to apply a 


continuity correction when using the central 
limit theorem. This is because the underlying 
distribution is the mean of the sample. Although 
this is a discrete random variable, it does not 
have to take integer values. It takes fractional 
values, and the gaps between values get smaller 
and smaller as 7 gets larger. 


1 A sample of size 6 is taken from a population that is normally distributed with mean 10 and 


standard deviation 2. 


a Find the probability that the sample mean is greater than 12. (3 marks) 


b State, with a reason, whether your answer is an approximation. (1 mark) 


2 A machine fills cartons in such a way that the amount of drink in each carton is distributed 
normally with a mean of 40 cm? and a standard deviation of 1.5cm?. 


A sample of four cartons is examined 


a Find the probability that the mean amount of drink is more than 40.5 cm’. 


A sample of 49 cartons is examined. 


b Find the probability that the mean amount of drink is more than 40.5 cm? on this occasion. 
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Central limit theorem 


M 3 The lengths of bolts produced by a machine have an unknown distribution with mean 3.03 cm 
(E/P) and standard deviation 0.20.cm. 


A sample of 100 bolts is taken. 
a Estimate the probability that the mean length of this sample is less than 3 cm. (3 marks) 


A second sample is taken. The probability that the mean of this sample is less than 3 cm needs 
to be less than 1%. 


b Find the minimum sample size required. (5 marks) 


(E) 4 Arandom variable X has the discrete uniform distribution 


P(X¥=x)=4 x=1,2,3,4,5 
40 observations are taken from X, and their mean_Y is recorded. 
Find an estimate for P(X > 3.2). (6 marks) 


(Pp) 5 A fair dice is rolled 35 times. 
a Find the approximate probability that the mean of the 35 scores is more than 4. 
b Find the approximate probability that the total of the 35 scores is less than 100. 


6 The 25 children in a class each roll a fair dice 30 times and record the number of sixes they obtain. 
Find an estimate of the probability that the mean number of sixes recorded for the class is less 
than 4.5. 


(E) 7 The random variable X has the probability distribution shown in the table. 
a Find the value of k. (2 marks) x 0 2 3 5 
P(X=x) | 0.1 | 3k k | 03 


A random sample of 100 observations of X is taken. 


b Use the central limit theorem to estimate the probability that the mean of these observations 
is greater than 3. (6 marks) 


c Comment on the accuracy of your estimate. (1 mark) 


(P) 8 A fair dice is rolled m times. Given that there is less than a 1% chance that the mean of all the 
scores differs from 3.5 by more than 0.1, find the minimum sample size. 


G 


The annual salaries of employees at a large company have an unknown distribution with mean 
£28 500 and standard deviation £6800. 
A random sample of 5 members of the senior management team is taken. 


A researcher suggests that N(28 500, a) could be used to model the distribution of the 


sample mean. 
a Give a reason why this is unlikely to be a good model. (1 mark) 
A second random sample of 15 employees from the whole company is taken. 


b Estimate the probability that the mean annual salary of these employees is: 
i less than £25 000 ii between £25 000 and £30000. (4 marks) 


c Comment on the accuracy of your estimate. (1 mark) 


19 


Chapter 5 


fy 10 An electrical company repairs very large numbers of television sets and wishes to estimate 


(E/P) 


the mean time taken to repair a particular fault. It is known from previous research that the 
standard deviation of the time taken to repair this particular fault is 2.5 minutes. 
The manager wishes to ensure that the probability that the estimate differs from the true mean 


by less than 30 seconds is 0.95. 


Find how large a sample is required. 


(6 marks) 


& Applying the central limit theorem to other distributions 


You can use the central limit theorem to solve problems involving the Poisson, geometric, binominal 
and negative binomial distributions. 


A supermarket manager is trying to model the number of customers that visit her store each day. 
She observes that, on average, 20 new customers enter the store every minute. 


a Calculate the probability that fewer than 15 customers arrive in a given minute. 


b Find the probability that in one hour no more than 1150 customers arrive. 


c Use the central limit theorem to estimate the probability that in one hour no more than 1150 


a Let X denote the number of customers 


that arrive in a minute. Then XY ~ Po(2Q). — 
P(X < 15) = O.1049 (4 dp.) 

Let T denote the number of customers 

that arrive in an hour. 

Then T ~ Po(GO X 20) 

50 T ~ Po(1200). 

P(T S 1150) = 0.07576 (4 dp) 


Consider a sample of GO observations 


customers arrive. Compare your answer to part b. 


It’s reasonable to assume that customers arrive 
independently of each other at a constant rate, 
so the number of customers arriving each minute 
will have a Poisson distribution. < Chapter 2 


You could also consider the number of customers 
who arrive in one hour as a sample of 60 


taken from X. + 
By the central limit theorem X is 
approximately ~ N(20, 28). 
or N(20, 4). 
If T < 1150 then ¥ < +22 = 191666... 
So P(T S 1150) = P(X < 19.1666...) 

= 0.0745 (4 d.p.) 
The two answers are close, so the 
approximation from the central limit 


theorem is quite good. 
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observations from X, the number who arrive in 
one minute. 


Problem-solving 


If }°X; is the sum of the observations from a 
sample of size n, then the sample mean is given 


DX; 


by X= 


Central limit theorem 


Example 


Billy is the captain of a football team. Each week he gets a team together by calling his friends one 
by one and asking if they would like to play. The probability of each friend agreeing to play is 5. 
Once he has 10 other players he stops calling. 


a Calculate the number of friends Billy expects to have to call to find 10 other players. 

b Find the probability that Billy has to call exactly 12 friends. 

In a season, Billy’s team plays 25 matches. 

c Estimate the probability that the mean number of calls per match Billy had to make was less 


than 15.5. 

a Let X be the number of friends Billy calls. The number of friends Billy calls is the number of 
Then X¥ ~ Negative B(10, 5), 50 trials required for 10 successes with probability 4 
E(X) = 22 = 15. of success, which has a negative binomial 

‘i 510 pe distribution. If X ~ Negative B(r, p), then 
Pers ee] } x (E] x (2) se 
2 3 3 E(¥) =y=and Va oe) 
= 0.1060 (4 d.p)- P 2 
10() € Section 3.3 


© E(X) = 15, and Var(X) = —> 
(5) 


= Se San a, SS AVe=e 
For a sample of size 25, the sample mean X ena = ie 1/P CE 


is approximately ~ N(15, 2), or N(15, 0.3), 
by the central limit theorem. 
P(X < 15.5) = 0.8193 (4 dip) « Use the normal distribution function on your 
calculator. 


Exercise 


! 


1 A random sample of 10 observations is taken from a Poisson distribution with mean 3. 
a Find the exact probability that the sample mean does not exceed 2.5. 


b Estimate the probability that the sample mean does not exceed 2.5 using the central limit 
theorem, and compare your answer to part a. 


2 Arandom sample of 12 observations is taken from a Hint | : 
random variable XY ~ Geo(0.25). Zo) MS (SAE 


: ; distribution with parameter 0.25, 
a Find the mean and variance of YX. 


i lp 
Se) E(X) =" => and Var(X) = Am 


2 


b Estimate the probability that the sample mean is 
greater than 5 using the central limit theorem. € Section 3.2 


3 A sample of size 20 is taken from a binomial distribution with n = 10 and p = 0.2. Estimate the 
probability that the sample mean does not exceed 2.4. (4 marks) 
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A 


(E/P) 


(E/P) 5 


EP) 8 


Chapter 5 


There are 20 children in a class. Each flips a fair coin until they get heads 5 times. 
a Write down the expected number of times each student will have to flip the coin. (2 marks) 
b Find an estimate of the probability that the mean number of flips is at most 9. (3 marks) 


A town is hit by three thunderstorms per month, on average. 
a Find the probability that there are four thunderstorms next month. (2 marks) 


b Use the central limit theorem to estimate the probability that over the course of a year, the 
average number of thunderstorms each month is at most 2.5. (4 marks) 


A patient is awaiting a liver transplant. The probability that a randomly selected donor is a 
match is 0.2. 
a Find the expected number of donors that will have to be tested before finding a match. 

(2 marks) 
A random sample of 20 patients awaiting liver transplants was selected, and the number of 
donors tested for each patient before a match was found was recorded. 


b Estimate the probability that the average number of donors to be tested per patient is more 
than 5.5. (3 marks) 


David is selling raffle tickets from door to door to raise money for charity. To reach his daily 
fundraising goal, he needs to sell 10 tickets. He observes that, on average, an occupant in one in 
every three houses he visits will buy a ticket. 


a Find the probability that on a given day he reaches his daily goal after visiting exactly 
35 houses. (2 marks) 
b In one month, David worked on 20 days, and met his daily goal on each day. Estimate the 
probability that the average number of houses he visited per day was 35 or fewer. (4 marks) 


Telephone calls arrive at an exchange at an average rate of two per minute. Over a period of 
30 days a telephonist records the number of calls each day that arrive in the five-minute period 
before her break. 


a Find an approximation for the probability that the total number of calls recorded is more 


than 350. (2 marks) 
b Estimate the probability that the mean number of calls received in this period each day is 
less than 9.0. (4 marks) 


Mixed exercise rs) 


E) 


1 A random sample of 100 observations is taken from a probability distribution with mean 5 and 
variance 1. Estimate the probability that the mean of the sample is greater than 5.2. (3 marks) 


(E/P) 2 A fair six-sided dice numbered 1, 2, 4, 5, 7, 8 is rolled 20 times. Estimate the probability that the 


8 


average score is less than 4. (4 marks) 


2 


3 


Oe 


4 


(E) 5 


Central limit theorem 


A sample of size is taken from a normal distribution with pw = 1 and o = 1. Find the minimum 
sample size such that the probability of the sample mean being negative is less than 5%. (3 marks) 


In a group of 20 students, each rolls a fair six-sided dice 10 times and records the number of 
sixes. Estimate the probability that the average number of sixes rolled by each student is 
greater than 2. (4 marks) 


Buses arrive at a bus stop on average once every 5 minutes. 
a Find the probability that exactly 3 buses arrive in the next 10 minutes. (2 marks) 


b Use the central limit theorem to estimate the probability that at least 25 buses arrive in the 
next 2 hours. (3 marks) 


(E/P) 6 A married couple plan to have children, and are desperate to have a daughter. They decide they 


will keep having children until they have a daughter and then stop. You can assume that giving 
birth to a girl or boy is equally likely, and independent of the gender of any other children the 
couple have had. 


a Find the probability that they will have more than 2 children. (2 marks) 
Suppose a group of 10 couples all decide on the same plan. 


b Estimate the probability that between them, the 10 couples have more than 24 children. 
(4 marks) 


The masses of eggs are normally distributed with mean 60 g and standard deviation 5g. 
A crate contains 48 randomly chosen eggs. 


a Calculate the probability that the mean mass of an egg in a randomly chosen crate is greater 
than 59g. (3 marks) 


b State, with a reason, whether your answer to part a is an estimate. (1 mark) 
The probability that an egg has a double yolk is 0.1. A sample of 30 crates is taken. 


c Estimate the probability that the sample will contain fewer than 150 double-yolk eggs in 
total. (5 marks) 


An automatic coffee machine uses milk powder. The mass, S' grams, of milk powder used in one 
cup of coffee is modelled by S ~ N(4.9, 0.87). 

‘Semi-skimmed’ milk powder is sold in 500 g packs. Find the probability that one pack will be 
sufficient for 100 cups of coffee. (4 marks) 


A random sample of size n is to be taken from a population with mean 40 and variance 9. 
Find the minimum sample size such that the probability of the sample mean being greater 
than 42 is less than 5%. (5 marks) 


A sample of size 20 is taken from a population with an unknown distribution, with mean 35 
and variance 9. Find the probability that the sample mean will be greater than 37. (3 marks) 
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Chapter 5 


fy ii A nationwide poll asked 500 people whether they prefer white chocolate or milk chocolate. 


(E/P) 


The polling company wants to determine whether the proportion of people who prefer milk 
chocolate differs significantly from 60%. 


The polling company assumes that in the population 60% of people prefer milk chocolate, 
and defines the random variable X to take the value | if a randomly selected member of the 
population prefers milk chocolate, and 0 otherwise. 


a Describe the distribution of X and state its mean and variance. (3 marks) 
Modelling the poll as a random sample of size 500 from the distribution in part a: 
b estimate the probability that the sample mean differs from 0.6 by 0.03 or more. (3 marks) 


c How many people should be polled in order for there to be a greater than 95% chance that 
the sample mean differs from 0.6 by at most 0.03? (5 marks) 


Challenge Hint ) You can use the fact that if 


Let Xj,..., X, be a random sample from a population with X, ~ N(fy, of) and X5 ~~ N(pz, 05) 


= 2 
distribution N(u, 07). Show that X ~ Nu, a 


are independent, then 
X, +X, ~ Nu, + Me, of + 07). 


Summary of key points 
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1 The central limit theorem states that given a random sample of size n from any distribution 


_ 2 
with mean pw and variance o%, the sample mean_X is approximately distributed as N(u, a 


Review exercise 


«) 1 


The random variable X has probability 
function 


(2x - 1) 
36 


a Construct a table giving the probability 
distribution of X. (2) 


Find: 


P(XY =x) = 


x= 1, 2, 3, 4, 5, 6 


b P@ <2 = 4), (1) 
c the exact value of E(X). (2) 
d Show that Var(X ) = 1.97 to three 
significant figures. (3) 
e Find Var(2 - 3X). (2) 


€ Sections 1.1, 1.2, 1.3 


The random variable X has probability 
function 


kx x = 1, 2, 3, 
poa=a= | ay x=4,5 
where k is a constant. 
a Find the value of k. (2) 
b Find the exact value of E(X). (2) 


c Show that, to three significant figures, 


Var(X) = 1.47. (3) 
d Find, to one decimal place, 
Var(4 - 3X). (2) 


€ Sections 1.1, 1.2, 1.3 


The random variable X has probability 
distribution given by 

x 1 2 3 4 > 
P(X=x) | 01 | p |0.20] ¢ | 0.30 


a Given that E(X) = 3.5, write down two 
equations involving p and q. (3) 


Find: 

b the value of p andthe value of g = (2) 
ec Var (XY) (3) 
d Var (3 - 2X) (2) 


€ Sections 1.1, 1.2, 1.3 


The random variable X has probability 
distribution given by 


x 1 3 5 7 9 
P(X=x) | 02] p | 02) q¢ [0.15 
a Given that E(X) = 4.5, write down two 
equations involving p and gq. (3) 
Find: 
b the value of p and the value of ¢ (2) 
c P4<X<7). (1) 
Given that E(X?) = 27.4, find: 
d Var(X ) (2) 
e E(19-4YX) (1) 
f Var(19-4YX). (2) 


€ Sections 1.1, 1.2, 1.3 


The discrete random variable X has 
probability distribution given by 

x =2. -1 0 
P(X=x) | 02/03] a | b 


The random variable Y is defined as 
Y=2 -3Y. Given that E(Y) = 2.9, 


a find the values of a and b (5) 
b calculate E(X’) and Var(Y ) (3) 
c write down the value of Var(Y) (1) 
d find P(Y+1<X). (2) 


€ Section 1.4 
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(iP) 6 
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Review exercise 1 


The discrete random variable XY has 
probability distribution given by 

x -3 -2 0 1 3 
P(X=x) | a b b a c 


The random variable Y is defined as 
Y= 1-2X 

P(Y > 0) = 0.5, find: 

a the probability distribution of X 
b P(-3X <5Y). 


(7) 
(2) 


€ Section 1.4 


Accidents on a particular stretch of 
motorway occur at an average rate of 
1.5 per week. 


a Write down a suitable model to 
represent the number of accidents per 
week on this stretch of motorway. (1) 


Find the probability that 


b there will be 2 accidents in the same 
week (2) 

c there is at least one accident per week 
for 3 consecutive weeks (3) 


d there are more than 4 accidents ina 
two-week period. 


(2) 


€ Sections 2.1, 2.2 


a State two conditions under which a 
Poisson distribution is a suitable model 
to use in statistical work. (2) 


The number of cars passing an 
observation point in a 10-minute interval 
is modelled by a Poisson distribution with 
mean 1. 
b Find the probability that in a randomly 
chosen 60-minute period there will be 
i exactly 4 cars passing the 


observation point (2) 
ii at least 5 cars passing the 
observation point. (2) 


The number of other vehicles (i.e. other 
than cars), passing the observation point 
in a 60-minute interval is modelled by a 
Poisson distribution with mean 12. 


4 Given that E(Y) = —0.05 and 9 


c Find the probability that exactly 
1 vehicle, of any type, passes the 
observation point in a 10-minute 
period. 


(4) 


€ Sections 2.1, 2.2, 2.3 


Two garden machinery firms hire out 
equipment independently of each 
other. 


Quikmow hire out lawn-mowers at a rate 
of 1.5 mowers per hour. 


Easitrim hire out lawn-mowers at a rate 
of 2.2 mowers per hour. 


a Ina one-hour period, find the 
probability that each company hires 
exactly | lawn-mower. (2) 

b Ina one-hour period, find the 
probability that between them, the two 
companies hire out 4 lawn-mowers. (3) 


c Ina three-hour period, find the 
probability that the total number of 
lawn-mowers hired out by the two 
companies is less than 12. 


(3) 


< Sections 2.2, 2.3 


A manufacturer places toys in cereal 
boxes. A random sample of 200 cereal 
boxes is taken, and the number of toys, 
x, in each box is observed. The data is 
summarised as follows: 


yx =290 Sox?= 702. 


a Calculate the mean and the variance of 
these data. (2) 


b Explain why the results in part a 
suggest that a Poisson distribution may 
be a suitable model for the number of 
toys in each box of cereal. (1) 


c Use a suitable Poisson distribution 
to estimate the probability that a 
randomly chosen box of cereal will 
contain at least 2 toys. 


(3) 


< Section 2.4 


(/P) 11 


a Write down the conditions under 
which the Poisson distribution may 
be used as an approximation to the 
binomial distribution. (2) 


A call centre routes incoming telephone 
calls to agents who have specialist 
knowledge to deal with the call. The 
probability of the caller being connected 
to the wrong agent is 0.01. 


b Find the probability that 2 consecutive 
calls will be connected to the wrong 
agent. (1) 

c Find the probability that more than | 
call in 5 consecutive calls are connected 
to the wrong agent. (2) 


The call centre receives 1000 calls each 
day. 
d Find the mean and variance of the 


number of wrongly connected 
calls. (2) 
e Use a Poisson approximation to find, 
to three decimal places, the probability 
that more than 6 calls each day are 
connected to the wrong agent. (3) 
< Sections 2.5, 2.6 


The random variable X has a binomial 

distribution Y ~ B(150, 0.02). 

a Find P(X = 3) (2) 

A Poisson random variable, Y ~ Po(A) is 

used to approximate Y. 

b Write down the value of J and justify 
the use of a Poisson approximation in 
this instance. (2) 


< Section 2.6 


In a manufacturing process, 1.5% of the 
articles produced are defective. A random 
sample of 200 articles is selected, and 

the number of defective articles, X, is 
recorded. 


a Write down the distribution of XY. (2) 
b Find P(X = 4) (2) 


Review exercise 1 


ce Explain why a Poisson distribution 
could be used as an approximation for 
X, and write down the parameter for 
this approximation. (2) 


d Use your answer to part ¢ to find an 
estimate for PLY = 4), and calculate the 
percentage error in your estimate. (3) 


€ Section 2.6 


/¥14 A computer software engineer is checking 


) 


the coding of a number of apps. The 
percentage of apps with defective code 
is thought to be 5%. X represents the 
number of apps checked up to and 
including the first one with defective 
code. 


a State the distribution than can be used 
to model X. (1) 

b Find the mean and variance of X. (2) 

c Find the probability that the engineer 
has to check the coding of at least 15 
apps before finding a defective one. (2) 


€ Sections 3.1, 3.2 


(E/P) 15 Anne-Marie is practising basketball and 


she continues to throw the ball until she 
gets it in the basket. The random variable 
Y represents the number of throws she 
needs. 
a State a suitable distribution to 
model Y. (1) 
Given that the mean of Y is 10, 
b find the probability that Anne-Marie 
gets her first basket on her seventh 


attempt (2) 
c find the variance of Y. (2) 

State any assumptions you have made 

in using this model. (2) 


€ Sections 3.1, 3.2 


(E/P) 16 A fair twelve-sided spinner has one side 


coloured red and eleven sides coloured 
blue. In a fairground game, players spin 
the spinner until it lands on red, at which 
point they win a prize. 
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G/P) 17 


E/P) 18 


«) 19 


88 


Review exercise 1 


a Find the probability of winning a prize 


(3) 
Keisha wants the probability of winning 
a prize to be greater than 0.75. 


if the player is limited to 10 spins. 


b Find the minimum number of spins to 
which she should limit players. (4) 


€ Sections 3.3 


Matt is playing a game at a school fete 
where his probability of winning a prize 
is 0.18. He plays the game several times. 
a Find the probability that he wins his 
third prize on his thirteenth game. (2) 
b State two assumptions that have to be 
made for the model used in part a to be 
valid. (2) 
c Find the mean and standard deviation 
of the number of times Matt needs to 
play to win his fourth prize. (3) 


Naomi plays a different game until she 
has won r prizes. Given that Y represents 
the number of games Naomi plays and 
that E(Y) = 20 and Var(Y) = 1135, 
d find the probability of Naomi winning 
a prize. (3) 
© Sections 3.3, 3.4 


The random variable X is the number of 

times a biased dice is rolled until 3 threes 

have occurred. The variance of X is 233. 

a Find the probability of rolling a three 
on the biased dice. (3) 

b Find P(X¥ = 9). (2) 

c Find P(X = Illa three occurs on the 
first roll). (3) 


€ Sections 3.3, 3.4 


a Explain what you understand by 

i an hypothesis test, 

iia critical region. (2) 
During term time, incoming calls to a 
school are thought to occur at a rate of 
0.45 per minute. To test this, the number 
of calls during a random 20-minute 
interval is recorded. 


a 


b Find the critical region for a two- 
tailed test of the hypothesis that the 
number of incoming calls occurs at a 
rate of 0.45 per 1-minute interval. The 
probability in each tail should be as 
close to 2.5% as possible. 


(5) 
c Write down the actual significance level 
of the above test. (1) 


In the school holidays, 1 call occurs in a 
10-minute interval. 


d Test, at the 5% level of significance, 
whether or not there is evidence that 
the rate of incoming calls is less during 
the school holidays than in term 
time. 


(5) 


€ Sections 4.1, 4.2 


A telesales agent claims she makes a sale 
on 2.5% of her calls. During her last 300 
calls, she made 11 sales. 


a Using a Poisson approximation to the 
binomial distribution, test the claim at 
the 5% level of significance. (4) 


b To the nearest multiple of 5%, what 
level of significance would need to be 
set in order to reject the claim of the 
telesales agent? 


(1) 


€ Sections 4.1, 4.2 


Counter staff at a take-away shop claim 
to receive orders at a mean rate of 4 every 
10 minutes. 


To test the claim, the shop owner records 
the number of orders received in a 
60-minute period. 


a Using a 10% level of significance, 
find the critical region for a two- 
tailed test of the hypothesis that the 
mean number of orders received in a 
60-minute period is 24. The probability 
of rejection in each tail should be as 
close as possible to 0.05. 


(3) 


my 22 


E/P) 23 


(E/P) 24 


b Find the actual level of significance of 


this test. (1) 


The actual number of orders received in 
this period was 18. 


c Comment on the counter staff’s claim 
in the light of this value. Justify your 
answer. 


(2) 


€ Section 4.2 


At a school tombola, it is claimed that 
1 in 5 tickets is a winning ticket. 


a State a suitable distribution to model 
the number of attempts needed before 
finding a winning ticket. (2) 

Mr Taylor, a maths teacher, decides to 

test this claim because he suspects the 

probability is less than this. He buys 
tickets one after the other and finds that 
he gets his first win on the 15th ticket. 


b Use a 5% level of significance to test 
whether Mr Taylor’s suspicion is valid. 


Ei) 25 


(3) Ep) 2% 


€ Section 4.3 


Xander is playing basketball. He claims 
that he makes a basket on 50% of his 
attempts. His team-mate claims that 
Xander is overstating his ability. 


Xander takes consecutive shots until he 
makes his first basket. He scores his first 
basket on his 5th shot. 


Test the team-mate’s claim, using a 10% 
level of significance and clearly stating 
your null and alternative hypotheses. 


(4) 


€ Section 4.3 


Anne and Brian work in a microchip 
manufacturing plant. Brian claims that 
the chance of a chip being faulty is 1 
in 2000. Anne suspects that this is an 
underestimate and that it is more likely 
that a randomly chosen chip is faulty. 


Anne decided to test Brian’s claim at the 
5% significance level by sampling from 
a large batch of chips until she finds a 
faulty one. 


(E/P) 27 


Review exercise 1 


a Find the critical region for Anne’s test. 
(5) 
b Anne finds the first faulty chip after 
selecting her 115th. Comment on this 
in light of your answer to parta. = (2) 


<€ Section 4.4 


A report on the health and nutrition of a 
population stated that the mean height of 
three-year-old children is 90cm and the 
standard deviation is 5cm. A sample of 
100 three-year-old children was chosen 
from the population. 


a Write down the distribution of the 


sample mean height. (2) 
b Hence find the probability that the 

sample mean height is at least 

91cm. (3) 


€ Section 5.1 


A sample of size 5 is taken from a 
population that is normally distributed 


with mean 10 and standard deviation 3. 
Find the probability that the sample 
mean lies between 7 and 10. 


(4) 


< Section 5.1 


The random variable X has the 
probability distribution shown in the 
table: 


x i | @ | ala 
P(X=x) | 04] 2k | 0.3] k 


a Find the value of k. (2) 


A random sample of 200 observations of 
X are taken. 


b Use the central limit theorem to 
estimate the probability that the mean 
of these observations is greater than 


2.09. (6) 
c Comment on the accuracy of your 
estimate. (1) 


< Section 5.1 
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Review exercise 1 


M28 A busy call centre receives, on average, 15 
(E/P) calls every minute. 
a Calculate the probability that 


fewer than 10 calls come in a given 
minute. (1) 


b Find the probability that in one 
30-minute period no more than 420 
calls come in. (2) 


c Use the central limit theorem to 
estimate the probability that in one 
30-minute period no more than 420 
calls come in. Compare your answer to 
part b. (5) 


€ Section 5.2 


29 A bag contains a large number of 
(E/P) coloured balls, red and green, in the 
ratio 3:1. A group of 20 students each 
repeatedly select a ball from the bag and 
then replace it, continuing until a green 
ball is selected. 


Use the central limit theorem to estimate 
the probability that the mean number of 
attempts needed to select a green ball is 
more than 4.5. (5) 


€ Section 5.2 


30 A group of students are completing a 
multiple choice quiz where there are five 
answers to each question. 


One student is chosen at random. Given 
that they guess each answer, 


a find the probability that they get 
their 4th question right on their 12th 
attempt. (2) 
b Find the expected number of questions 
they must answer to get 4 right. (2) 
There are 15 students in the group. Each 
student continues answering questions 
until they have achieved four correct 
answers. 
c Estimate the probability that the mean 
number of questions answered per 
student is less than 19. (5) 


€ Section 5.2 
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Challenge 


1 Three fair four-sided dice are rolled. The discrete 


random variable X represents the difference 
between the highest score and the lowest score 
on the three dice. 


a Write down the probability distribution of X. 
b Show that E(X)=-2 € Sections 1.1, 1.2, 1.3 


a If X~ BG, p), show that the probability 
distribution of X can be written as 


XG 0) 1 2 33 
P(X=x)| @° | 4q’p | 6p°q* | 4p’q |p’ 
where q=1-p. 


b Hence show that E(X) = 4p and 


Var(X) = 4p(1 - p). € Section 1.4 


A metal detectorist has a 12% chance of 

finding something valuable when he searches 

a particular quadrant of a field. He starts 
searching in another field and wishes to test if 
there is greater chance in this field of finding 
something valuable. He decides to count the 
number of quadrants searched until he finds his 
fourth item of value. 


a Using a 2.5% level of significance find the 
critical region for this test. Use a negative 
binomial distribution. 

b Write down the actual significance level of 
the test. € Sections 3.3, 4.4 


Chi-squared tests 


_ Objectives > 


After completing this chapter you should be able to: 


e@ Form hypotheses about how well a distribution fits as a 
model for an observed frequency distribution and measure 
goodness of fit of a model to observed data — pages 92-96 


e Understand degrees of freedom and use the 

chi-squared (x) family of distributions — pages 96-99 
Be able to test a hypothesis — pages 99-103 
Apply goodness-of-fit tests to discrete data. -> pages 103-113 
Use contingency tables — pages 113-119 


Apply goodness-of-fit tests to geometric 
distributions — pages 119-122 


Prior knowledge check — 


1 Suppose that Y ~ Po(5), find P(X < 2). 
€ Section 2.1 


2 Suppose that X ~ Geo(;), find P(X > 5). 
€ Section 3.1 J 


‘ 3 David claims 60% of the students in 
his school like baked beans. He takes 
” a random sample of 100 students and 
finds that 70 of them say that they like 
baked beans. Test David's claim, at the 
5% significance level. 
; € Statistics and Mechanics Year 1, Chapter 7 


The chi-squared test is used in genetics to 
help determine whether an experiment was 
fair and unbiased, and to provide a level 

of confidence for whether the results were 

| obtained by chance. — Exercise 6A, Q4 
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Chapter 6 


(6.1) Goodness of fit 


Goodness of fit is concerned with measuring how well an observed frequency distribution fits to a 
known distribution. 


Suppose you take a dice and throw it 120 times. You might get results like these: 


Number, 7 1 2 3 4 5 6 


Observed frequency | 23 | 15 | 25 | 18 | 21 | 18 


If the dice is unbiased you would, in theory, expect each of the numbers 1 to 6 to appear the same 
number of times. 


For 120 throws the expected frequencies would each be: 


P(X =x) x 120= + x 120 = 20 The expected results fit a uniform 
discrete probability distribution: 


oe IL ere ee) 


You would expect results like these: 


[ony 


ale 


A 
6 


ae 


poe all 1 il 
Number, ae a ae ae we: BOC ew G 


Expected frequency | 20 | 20 | 20 | 20 | 20 | 20 


Since you are taking a sample, you should not be surprised that the observed frequency for each 
number doesn’t match the expected frequency exactly. 


However, suppose now the dice was biased, you would also not expect the observed frequency of 
each number to be exactly 20. 


Although both the results from the biased and unbiased dice would differ from the predicted results, 
the results from the unbiased dice should be better modelled by the discrete uniform distribution 
than those from a biased dice. 


We form the hypothesis that the observed frequency distribution does not differ from a theoretical one, 
and that any differences are due to natural variations. Because this assumes no difference, it is called 
the null hypothesis. 


The alternative hypothesis is that the observed frequency distribution does differ from the 
theoretical one and that any differences are due to not only natural variations but the bias of the dice 
as well. 


= H,: There is no difference between the observed and the theoretical distribution. 


= H,: There is a difference between the observed and the theoretical distribution. 


In order to tell how closely the model fits the observed results you need to have a measure of the 
goodness of fit between the observed frequencies and the expected frequencies. 


This measure used for goodness of fit may be understood by looking further at the results of the 
dice-throwing experiment. 
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The results and the expected frequencies are: 


Chi-squared tests 


H Observed frequency 
O Expected frequency 


Number on dice, 7 1 2 3 4 5 6 
Observed frequency, O; 23 15 25 18 21 18 
Expected frequency, E; 20 20 20 20 20 20 
You can show this as a bar chart: 
The thing you instinctively look at is the 
difference between the observed and o 
the expected values. 2 at 
ZS 20 
As a measure of the size of these differences ~ 10 | 


you take the sum of the squares of the 
differences, divided by the expected frequency: 


Sy OE og 
_ E wnere 


O; = an observed frequency 


E,= an expected (theoretical) frequency, asserted by the null hypothesis 


3 


4 5 6 


Number 


This gives a positive number that gets larger as the differences between the observed and the 
expected frequencies get larger, and smaller as the differences get smaller. 


= The measure of goodness of fit is: 


O, - E)? 
xe)! ; - Ei) 
i=1 E; 


The symbol X° is used rather than just Y because it 


shows that the value is never going to be negative. 


You can see that the less good the fit, the larger the difference between each observed and expected 


value, and the greater the value of X°. 


= Here is another way of calculating X?: 


y= OER POE 


O? 20;E; OE? 
La LE LE 


Multiply out the bracket. 


)>E and 320 are both equal to the total number 
of trials, or observations. So DE =) O=N. 


This formula is not given in the formulae booklet, 
but is easier to use. 
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Example 


Billy and Mel each have two 4-sided spinners numbered 1—4. They each carry out experiments, 
where they spin their spinners at the same time, and add the scores together. After each student has 
carried out 160 experiments, the frequency distributions are as follows: 


Number, 7 2 3 4 5 6 7 8 
Observed by Billy (O,) 12 15 22 41 33 21 16 
Observed by Mel (O,) 6 12 21 37 35 29 20 
Expected (£;) 10 20 30 40 30 20 10 


Both Billy and Mel believe that their spinners are fair. 


a State the null and alternative hypotheses for the experiment. 


One of the students has a biased spinner. 


b Calculate the goodness of fit for both students, and determine which of them is most likely to 


have the biased spinner. 


a Ho: the observed distribution is the same as the 


theoretical distribution. (The spinner is unbiased.) 
H,: the observed distribution is different to 


the theoretical distribution. (The spinner is 
biased.) 


n 2 3 4 5 6 7 8 Total 
(O; - E)? 
| OF 125 | 2133 | 0.025] O03 0.05 3.6 7.755 
Billy a 
- 144 | 11.25 | 16133 |42.025/363 2205 | 256 | 167755 
(O; — E)? 
= 16 3.2 27 0.225| 0.633| 405 | 10 22.608 
Mel a 
3.6 72 14.7 | 34.225 | 40.833] 42.05 | 40 182.608 


8 
O; — E})? 
Results for Billy: X* = hes = (1395 
i=2 i 
or using the alternative method: 
Oa 


O: 
r=), a - N= 167755 - 160 =7.755 


i=2 ! 6 
(0; - E)* 
Results for Mel: X® = ———- = 22.606. 
i=2 i 
or using the alternative method: 
8 A2 


0; 
i S FANS ee e0e = 160 = Be eo t Watch out ) The higher the value of X%, the 


tas less similar the observed distribution is to the 
theoretical distribution. 


Mel’s goodness of fit is higher, so she is 
more likely to have the biased spinner. 
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Exercise 


1 An octagonal dice is thrown 500 times and the results are noted. It is assumed that the dice is 
unbiased. A test is to be done to see whether the observed results differ from the expected ones. 
Write down a null hypothesis and an alternative hypothesis that can be used. 


2 A six-sided dice is rolled 180 times to try to establish whether or not it is fair. The results of the 
rolls are as follows: 


Number, 7 1 2 3 4 5 6 
Observed rolls (O;) 27 33 31 28 34 27 


a State the null and alternative hypotheses for the experiment. 
b Calculate X¢ for the observed data. 


3 A random sample of 750 UK secondary school students is taken, and the year group they are 
each in is recorded: 


Year 7 8 9 10 11 
Observed (O,) 190 145 145 140 130 


A researcher wants to test to see whether UK secondary school students are uniformly 
distributed across each year group. 


a State suitable null and alternative hypotheses. 


b Calculate the expected number of students in each year group assuming your null hypothesis 
is true. 


ce Calculate X¢ for the observed data. 


4 A particular genetic mutation is believed to have a 75% chance of being passed from parent to 
child. In an experiment, 160 adults with the mutation each had one of their children tested to see 
if the child had inherited the mutation. The results were as follows: 


Mutation present Yes No 
Observed (O,) 117 43 


a Calculate the expected frequencies. 
b State the null and alternative hypotheses. 
c Calculate the goodness of fit of the data to the expected result. 


() 5 John has two coins that he can’t tell apart. One is fair. The other is biased and will land on heads 
with probability 0.6. He flips one of the coins 50 times and records the results in the frequency 
table given below. 


Result H T 
Observed (O;) 28 22 


a Calculate the expected frequencies for each coin. 


b Calculate the goodness of fit between the observed results and the expected results for each 
coin. 


c Which coin is John more likely to have been using? Give a reason for your answer. 
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@) 6 The BMI profile of English adults is given below. 


Country Underweight Normal Overweight Obese Total 
England 2% 35% 36% 27% 100% 
Obesity Statistics, House of Commons Briefing Paper, Number 3336, 2017 


You may assume that these percentages reflect the true distribution. A sample is taken of adults 
in Wales, and the results are recorded in the table below. 


Country Underweight Normal Overweight Obese Total 
Wales (Men) + 70 80 46 200 
Wales (Women) 6 81 65 48 200 


By calculating the goodness-of-fit statistic for both Welsh men and women, determine which 
group more closely matches the English distribution. 


& Degrees of freedom and the chi-squared (x2) family of distributions 
An important consideration when deciding goodness of fit is the number of degrees of freedom. 


In general, degrees of freedom are calculated from the size of the sample. They are a measure of 
the amount of information from the sample data that has not been used up. Every time a statistic is 
calculated from a sample, one degree of freedom is used up. 


In this chapter, in order to create a model for the Problem-solving 


observed frequency distribution, you must use the ; 
information about the data in order to select a Led atola Mio ilisselie Cree ne 
: is aes the results, there are two observed 
suitable model. To begin with you have 1 observed : 
: frequencies: the number of heads, and the 

frequencies, and your model has to have the same Pe 

if h ee number of tails. There is one constraint: the 
tota requency as the observed distribution. fact that the total frequency must be 100. 
The requirement that the totals have to agree is called Therefore the number of degrees of freedom 
a constraint, or restriction, and uses up one of your is 2— 1=1.|f you know one frequency, x, 
degrees of freedom. then you can calculate the other, 100 — x. 
Similarly, if a dice is rolled 120 times, there 
are six observed frequencies and one 
constraint (the total number of rolls). So 


The number of constraints will also depend on the 
number of parameters needed to describe the 
distribution and whether or not these parameters there are 6 — 1 = 5 degrees of freedom. 

are known. If you do not know a parameter you have Setting values for any 5 of the frequencies 

to estimate it from the observed data and this uses uniquely determines the 6th. -> Section 6.4 
up a further degree of freedom. 


It is usual to refer to each rectangle of a table that t Watch out If the estimate of a parameter is 


contains an observation as a cell. You sometimes have calculated then it is a restriction. If it is guessed 
to combine frequencies from different cells of the table. by using an estimate that seems sensible from 
(The reason for this is given on the next page.) observations then it is not a restriction. 


If cells are combined in this way then there are fewer 
expected values, so when you calculate the number of degrees of freedom you have to count the 
number of cells after any such combination and subtract the number of constraints from this. 


Number of number of cells 


degrees of freedom = (after any combining) ~ "U™ber of constraints 
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The x¢ (pronounced kye-squared) family of distributions can be used as approximations for the 
statistic ¥*. We write this: 


oe OE oe 


X?is approximated well by y* as long as none of the expected values (E;) fall below 5. 


= If any of the expected values are less than 5, then you have to combine frequencies in the 
data table until they are greater than 5. 


Usually frequencies adjacent to each other in the table are joined together because if one value is low 


the next one is also likely to be low. Notation ) 


The number of degrees of freedom ina x 
distribution is written using the greek letter 
y, which is pronounced ‘nu’. 


The x? family of distributions are theoretical 
ones. The probability distribution function of 
each member of the family depends on the 
number of degrees of freedom. 


To distinguish which member of the family of distributions you are talking about you write y%. 
Thus x@ is the x? distribution with v4. 


= When selecting which of the ,? family to use as an approximation for X2 you select the 
distribution which has v equal to the number of degrees of freedom of your expected values. 


In a sample of 100 households, the expected number of dogs is as follows: 


Dogs 0 1 2 3 4 5 | >5 | Total 
Expected | 55 | 20 | 10 | 7 | 4 3 1 100 


L_] I 
Select an appropriate chi-squared distribution to 


model the goodness of fit, XY’, for these data. 


Combining the frequencies: 


Dogs O | z 3 >3 | Total 
Expected | 55 20 10 4 6 100 


Degrees of freedom =5-1=4 + 


Therefore X* ~ x4 is the correct approximation. 


You began this chapter by forming a null hypothesis that there was no difference between the 
observed and the theoretical distributions. 


Next, you found a measure of goodness of fit. 

a (0; — Ei)? 
The question which arises is, ‘could the value of X? = Pn ee calculated for your sample come 
from a population for which X* is equal to zero?’ ‘ 
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As with all hypothesis tests, you will only reject the null hypothesis if, by accepting the alternative 
hypothesis, you have only a small chance of being wrong. Typically this figure is probability at 5%. 


To find the value of X¢ that is only exceeded {Notation ] Ee enranenamneennicl 


with probability of 5% (the critical value), we use 
the appropriate y* distribution. 


Example a) 
With v = 5 find the value of x? that is exceeded with 0.05 probability. | 


value of x? which is exceeded with probability 5% 
is written x* (5%) or x? (0.05). 


x2 (5%) = 11.070 
This is shown on the probability diagram below. 
A 
5 
S 0.995 0.100 
OS 0.000 ~ 2.705 
ee 0.010 4.605 
is - Area = 0.05 0.072 6.251 
$ 0.207 7.779 
Ax 
y mere Xs 0.676 0.645 
0.989 2.017 
Also from the table, .Z (10%) = 9.236 
and x2 (2.5%) = 12.832. 


For each other value of v the critical values may be 
looked up in the same way. 


Example 4) 


* o = ee — Uo ' Online ) Explore the *-squared pean 
Le (95%) it x; (10%) i : i 
and use it to determine critical values for 


goodness of fit using GeoGebra. 


b Find the smallest values of y such that: 
i P(xy > y) = 0.95 ii P(x > y) = 0.99 


ai v=3 
Level of significance = 0.95 
From the table on page 192, the critical value of v* is 0.352. 
ii v=4 
Level of significance = O.1 
From the table on page 192, the critical value of x? is 7.779. 


bi P(x$>0.103)=95% - 
ii POG > 0.297) = 99% 
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Exercise 


1 A group of 50 students record the days (Monday-Sunday) that their birthdays will fall on this 
year. How many degrees of freedom are there in the frequency distribution? 


2 For 5 degrees of freedom find the critical value of y? which is exceeded with a probability of 5%. 


3 Find the following critical values: 
a x3 (5%) b x; U%) € Xj, 10%) 


4 With v = 10 find the value of y2 that is exceeded with 0.05 probability. 
5 With v = 8 find the value of y? that is exceeded with 0.10 probability. 


6 The random variable Y has a y? distribution with 8 degrees of freedom. 
Find y such that P(Y > y) = 0.99. 


7 The random variable X has a y? distribution with 5 degrees of freedom. 
Find x such that P(X > x) = 0.95. 


() 8 The random variable Y has a y? distribution Notation ] THe ie sbatcnreonnnucuses 


with 12 degrees of freedom. P(Y<y)=1-P(Y>y). 
Find: 
a ysuch that P(Y < y) = 0.05 b ysuch that P(Y < y) =0.95. 
©) 9 The frequency table below shows 50 samples from what is believed to be a Geo(0.5) distribution. 
x 1 2 3 4 5 
P(X =x) 24 12 6 6 


a State, with a reason, which cells should be combined before using a x? distribution to 
model the goodness of fit for these data. (1 mark) 


b State a suitable ,? distribution and find the critical value for a 1% significance level. (2 marks) 


@® Testing a hypothesis 


By using a suitable y¢ distribution to model the 

goodness of fit you can carry out a hypothesis Notation ) a is used to denote the significance 
test for the null hypothesis that the observed level to which the data is being tested. 

data fits the theoretical distribution. 


You need to choose a significance level, a, for the hypothesis test. This is often 5%, and will be given 
in the question. 


You can then compute the critical value, x° (a), which will depend on the significance level, a, and the 

number of degrees of freedom, v. The probability of observing data with a goodness of fit exceeding 

the critical value is a. 

= If X2 exceeds the critical value, it is {Watch out J A hypothesis test for goodness of 
unlikely that the null hypothesis is correct, fit is always one-tailed. This means the critical 


so you reject it in favour of the alternative region is always the set of values greater than 
hypothesis the critical value. 
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Example 


In an experiment where a dice is rolled 120 times, the frequency distribution is to be compared to a 
discrete uniform distribution as shown: 


Number on dice, 7 1 2 3 4 5 6 
Observed frequency, O; 23 15 25 18 21 18 
Expected frequency, E; 20 20 20 20 20 20 


Test, at the 5% significance level, whether or not the observed frequencies could be modelled by a 
discrete uniform distribution. 


Ho: The observed distribution can be modelled by a discrete uniform — 
distribution. (The dice is not biased.) 


H,: The observed distribution cannot be modelled bya - 


discrete uniform distribution. (The dice is biased.) 


The number of degrees of freedom is 6 — 1 = 5.+ 
From the table on page 192 the critical value of x7 is 11.070 at the 
5% level, i.e. Xs (5%) = 11.070. 
(0; - E)? 
n this case you can calculate ~ as follows: « 
Number 1 2 3 4 5 6 Total 
O; iS: i) 25 16 21 16 120 
E; 20 20 20 20 20 20 120 
(O;= 5)" 
E. OAS | M25 | 125 O2 | 0:05. Or 34 
oO? 
FE 26.45 | 11.25 | 31.25 | 16.20 | 22.05 | 16.20 | 123.4 
E; 
5s E)? OC, 
E, =34o0r E, -N=123.4-120=34 


Since 3.4 < 11.070 there is not enough evidence to reject the null 
hypothesis at the 5% level. 


There is no evidence that the dice is biased. - 


Example 


Alan has two identical 4-sided spinners with the numbers 1—4 written on each of them. He carries 
out experiments, where he spins both of his spinners at the same time, and adds the scores together. 
After 160 experiments, the frequency distribution is as follows: 


100 


Number, n 2 3 4 5 6 7 8 
Observed by Alan (O,) 14 11 26 33 42 18 16 
Expected (E£;) 10 20 30 40 30 20 10 


Chi-squared tests 


The table also shows the expected distribution of the scores if both spinners are unbiased. 


Test, at the 2.5% significance level, whether the observed frequencies could be modelled by the 
expected distribution shown. 


Ho: The observed distribution can be modelled by the 

expected distribution shown. (The spinners are not biased.) 

H,: The observed distribution cannot be modelled by es 
expected distribution shown. (The spinners are biased.) 

The number of degrees of freedom is 7 - 1= 6. 

ie 


rom the table on page 192, the critical value of x? is 
14.449 at the 2.5% level i.e. x2(2.5%) = 14.449. 


We calculate X? as follows: 


n 2 3 4 5 6 7 6 Total 

O; 14 1 26 33 42 18 1G 160 

E; 10 20 30 40 30 20 10 160 
(O,= E,)* 

E 1.600 4.050 Ulisse tee 4.600 0.200 3.600 16.0065 

O? 

z 19.600 | 6.050 | 22.533 | 27.225 | 58.800 | 16.200 | 25.600 | 176.005 


t Watch out ) You only need to use one of the formulae to calculate the goodness of fit in your exam — 
they both give the same answer. 


So X* = 16.008 


Since 16.008 is greater than 14.449, we reject 
the null hypothesis at the 2.5% level. 


There is evidence, at the 2.5% level, that the 
spinners are biased. 


Example 


A school conducted a survey into the impact that a new exercise club was having on students. 
Prior to the new club starting, 60% of students said they had no regular exercise, 30% reported 
exercising once a week and 10% reported exercising more than once a week. After the new club 
started, they surveyed the 150 students to find out how often they exercised. 


No regular exercise Once a week More than once a week Total 
Frequency 73 57 20 150 


Based on these data, is there evidence of a change in attitude to exercise following the introduction 
of the new club? Test the data at a 5% significance level. 
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ach week.) 


tudents exercise each week.) 


From the table on page 192, the critical value of x 
level, i.e. x3(5%) = 5.991. 


We calculate X@ as follows: 


Ho: The observed distribution has not changed from the original 
distribution. (The new club has had no effect on the number of times 
S 

H 


€ 
The number of degrees of freedom is 3 — 1 = 2. 


2 is 5.991 at the 5% 


: The observed distribution has changed from the original distribution. 


(The new club has had an effect on the number of times students exercise 


week. 


5% significance level. 


Exercise (6C) 


1 Inan experiment where a dice is rolled 72 times, the frequency distribution is to be compared to 
a discrete uniform distribution as shown: 


No exercise Once a week More than Total 
once a week 
Observed TO Dif 20 150 
Expected |0.6 x 150 = 90]0.3 x 150 = 45] 0.1 x 150 = 15 150 
.- £.)2 
Oar B21 3200 1.667 8.078 
OF 
> 59.211 72.200 26.667 158.078 
So-X* = 3.078: 


Since 8.078 is greater than 5.991, we reject the null hypothesis at the 


Therefore, at the 5% significance level, there is evidence that the 
new club has had an effect on the number of times students exercise each 


Number on dice, 1 1 2 3 4 6 
Observed frequency, O; 16 11 13 15 8 9 
Expected frequency, E; 12 12 12 12 12 12 


| Watch out NEE: 


for goodness of fit 
only tells you how 
closely the observed 
data matches the 
theoretical (or 
assumed) distribution. 
You cannot conclude 
that the new club has 
increased the amount 
of exercise students 
do - only that the 
amount has changed. 


Test, at the 5% significance level, whether or not the observed frequencies could be modelled 
by a discrete uniform distribution. 
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©) 2 Ina tombola, tickets ending in a 0 or a 5 are guaranteed a prize. 
All other tickets will lose. At a fair, 120 tickets were drawn, and Hint } If the tickets were 


fairly distributed, it would be 
expected that 2 in every 10 
would be winning tickets. 


the numbers of winning tickets were as follows. 


Total 
120 


Winning 
15 


Losing 
105 


Observed ticket draws 


Test, at the 5% significance level, whether or not the tombola was fair. (6 marks) 


A local travel agent has made a prediction as to how many trips abroad his customers make. 
He surveys a sample of 100 customers and compares the results to his expectations. 


Trips abroad None One Two or more 
Expected 10% 60% 30% 
Sample 4 73 23 


Test, at the 2.5% significance level, whether the travel agent’s prediction fits the observed data. 
(6 marks) 


In a sample of 100 households, the actual and expected numbers 
of dogs is as follows: 


Dogs 0 1 2 3 4 5 >5 | Total Problem-solving 
Observed | 45 19 11 8 7 4 100 ; eae 
Expected | 55 | 20 | 10 | 7 | 4 | 3 | 1 | 100 Cen Ue 2) 2 ibaa seh 
to approximate the 
a Explain why there are 4 degrees of freedom in distribution of X%, if any of 
this case. (2 marks) the expected frequencies 


is less than 5 you need to 


b Test, at the 5% significance level, whether the observed data 
combine cells. 


fits the expected distribution given. (5 marks) 


In the year 2000, the birth weights of babies were distributed as follows: 


Weight (g) | Under 1500 | 1500-1999 | 2000-2499 | 2500-2999 | 3000-3499 | 3500 and over | Total 
Percentage 1.3% 1.5% 5% 16.5% 35.7% 40% 100% 
In the year 2015, the birth weights of babies were as follows: 
Weight (g) | Under 1500 | 1500-1999 | 2000-2499 | 2500-2999 | 3000-3499 | 3500 and over | Total 
Frequency 7286 9304 32121 112535 244 472 281942 687 660 
Using a 5% significance level, decide whether the distribution of birth weights from 2000 
can be used as a model for the weights in 2015. (8 marks) 


6.4) Testing the goodness of fit with discrete data 


The steps you need to take to test goodness of fit with discrete data can be summarised as follows. 


= 


1 Determine which distribution is likely to be a good model by 
examining the conditions applying to the observed data. These will often be 


given in the question. 


2 Set the significance level, for example, 5%. - 


3 Estimate parameters (if necessary) from your observed data. 
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on A WwW RK 


9 Calculat 


Form your hypotheses. 


Calculate expected frequencies. 


Find the critical value of .2 from the table. 


7 so aw 


10 See if your value is een 


11 Draw the appropriate conclusion and interpret in the context of the original problem. 


Testing a discrete uniform distribution as a model 
You have already seen an example of this. The conditions under which a discrete uniform distribution 


arises are: 


e the discrete random variable X is defined over a set of & distinct values 


e each value 


is equally likely 


The probability of each value is given by 


P(X =x 


5% P= 1, 2,3.5,K 


The frequencies for a sample size of N are given by 


Frequen 


In a discrete uniform distribution, the probability of each outcome is only dependent on the size 
of the sample space. This means that there are no additional parameters to estimate, so the only 
restriction is that the expected frequencies add up to N. The number of degrees of freedom is 
one less than the number of cells, after any cells have been combined. 


cy=P(X= x) x N=2xN p21, 2 awk 


100 digits between 0 and 9 are selected from a table with the frequencies as shown below. 


Combine any expected frequencies so that none are less than 5. 


Find v using v = number of cells after combining — number of constraints or restrictions. 


Digit 


0 1 2 3 4 5 6 7 


8 


9 


Frequency 


ll | 8 8 7 8 9 |} 12| 9 


13 


15 


Could the digits be from a random number table? Test at the 0.05 level. 


Ho: A discre 


he digits a 


(The digits a 


= 10 


Each digit should have an equal chance of selection, so 
the appropriate model is the discrete uniform distribution. 


te uniform distribution is a suitable model. 
re random.) 


( 
H,: A discrete uniform distribution is not a suitable model. 


re not random.) 


Pasay FEO tun8 
The number of degrees of freedom is: 


af=o-s 
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From the table on page 192, x§ (5%) = 16.919 


Digit O 1 2 3 | 4 5 
Observed, O;| 11 8 8 x ) 2 
Expected, E; | 10 | 10 | 10 | 10 | 10 | 10 
(0; - E,)? 

E 


i 


(O; — E,)? 
ae ae 
S pe = EY) 
So: > ee <16. 919+ 


Do not reject Ho: there is no evidence to suggest the digits are not || 
random. 


0.1 | 0.4 | 0.4 | 0.9 | 0.4 | 0.1 


Testing a binomial distribution as a model 

The conditions under which a binomial distribution arises are: 

e there must be a fixed number (n) of trials in each observation 

e the trials must be independent 

e the trials have only two outcomes: success and failure 

e the probability of success (p) is constant 

For a binomial random variable: 
P(X =r) =(") pd — pyr-r r=0,1,2,...,” 

The frequency f, with which each r occurs when the number of observations is N is given by 
f.=PW=r) xn 


The binomial distribution has two parameters, and p. You have the usual restriction that the 
expected frequencies have to have the same total as observed frequencies, while p may be known or 
it may be estimated from the observed values by using frequencies of success. 


total number of successes _ U(r x f;) ras 
a number of trialsx NN | nxN pee 
distribution, each observation 
If p is not estimated by calculation: 1 =number of cells — 1 is of the number of successes in 
If p is estimated by calculation: v = number of cells — 2 H tials. 0 Tor W observations 


there aren x MN trials in total. 


The data in the table are thought to be modelled by a binomial distribution B(10, 0.2). Use the 
table for the binomial cumulative distribution function to find expected values, and conduct a test 
to see if this is a good model. Use a 5% significance level. 


x 0 1 2 3 4 5 6 7 8 
Frequency | 12 | 28 | 28 | 17 | 7 4 2 2 0 
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Ho: A B(10, 0.2) distribution is a suitable model for the results. 


Hy: The results cannot be modelled by a B(10, 0.2) distribution. 


x O 1 2 3 4 
; 

ee 0.1074 | 0.2684 | 0.3020 | 0.2013 | 0.0881 

Expected 


frequencies 


There are 7+4+4+2+2=15 observed values when x = 4. 


Expected frequency = 6.81 + 2.64 + 0.55 + 0.08 + 0.01 = 12.09 


x O | 2 3 =4 

O; 12 26 26 Vy ile) 

E; 10.74 | 26.64 | 30.20 | 20.13 12.09 

.— F)2 
ese 0.1478 | 0.0501 | 0.1603 | 0.4867 | 0.7004 

Number of degrees of freedom = number of cells -1=5 -1= 4. 
Pp was not estimated by calculation this time.) 
From the table on page 192 the critical value xs (5%) is 9.466. 


Seg 
E, = 1.5453 + 


1.545 <9.488 - 
Do not reject Ho: B (10, 0.2) is a possible model for the data. 


A study of the number of girls in families with five children was done on 100 such families. 
The results are summarised in the following table. 


Number of girls (r) 0 1 2 3 4 5 
Frequency (f) 13 18 38 20 10 


It is suggested that the distribution may be modelled by a binomial distribution with p = 0.5. 
a Give reasons why this might be so. 


b Test, at the 5% significance level, whether or not a binomial distribution is a good model. 
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a There is a fixed number of children in the family so 
n= 5. The trials are independent. (Assume no multiple 7 
births.) There are two outcomes to each trial: success 
(a girl), failure (a boy). The assumption that a girl is as 
likely as a boy is reasonable. 


b Ho: B(S, 0.5) is a suitable model. 


Hy: B(5, 0.5) is not a suitable model. - 

r O 1 2 3 4 5 

O; 13 16 36 20 10 i 

E; Sie | IS:63: | 31.25 |) 381.25: | 15.63 | 202 |-— 
Since 3.12 < 5 you must combine cells. 

O; 31 36 20 11 

E; 18;7°5 31.25 21.25 18.75 


e 0; — Ei)? 
B= 16715° 


You have 4 — 1 = 3 degrees of freedom. * 
From the tables: x§ (0.5) = 7.615° 
16.715 > 7.615 

Reject Ho: the number of girls in families of 
5 children cannot be modelled by B(5, O.5). 


Example 


Look at the data in the previous example. Problem-solving 


By estimating a suitable value of the : ; 
In the previous example you determined that 


t carry out a test, at the 5° 
ae wae \ 1 : Aeteretnewie ie = B(5, 0.5) was not a suitable model for the data, at 
ee ee eee : the 5% level. However, B(5, ») may be a suitable 


nae distribution is a suitable model for model for some different value of p. 
the data. 


Ho: A binomial distribution is a suitable model. 


e 


Hy: A binomial distribution is not a suitable 
model. 

The number of observations N = 100, the 
number of trials n = 5. 
_Urxf) 199 
P="100n ~ 100x 5 
and, because you estimated p, there will be 


=0.398 - 


two constraints. 
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r P(r) E; 

O | (0.602)? = 0.0791 Tol i 

1 5(0.602)4(0.398) = 0.2614 26.14 

2 | 10(0.602)9(0.398)* = 0.3456 | 34.56 

3 | 10(0.602)(0.396)* = 0.22865 22.09 

4 | 5(0.602)'(0.398)4 = 0.0755 LOS 

D» |(0.396)? =0,0099 0.99 

O; 
r O; E; : 
E; 

O 13 haa | ZNOt 

1 18 26.14 12:39 

2 36 34.56 41.765 

3 20 22.65 17.51 

>3 al 6.54 14.17 y 
Total 107.22 
There are 5 — 2 = 3 degrees of freedom. 
The critical value is x5 = 7.615.° 

(O7)* 
>, = N=107.22 =100=7.22 
7.22 <7815- 

Do not reject Ho. A binomial distribution is a ———— 
suitable model. 


Testing a Poisson distribution as a model 

The conditions under which a Poisson distribution is likely to arise are: 
e the events occur independently of each other 

e the events occur singly and at random in continuous space or time 


e the events occur at a constant rate, in the sense that the mean number in an interval is 
proportional to the length of the interval 


e the mean and the variance are equal 


For a Poisson distribution with mean A: 
Ayr 
P(Y=n) ee $201, 2c. 


Although, theoretically, r can be any one of an infinite number of integer values, in practice, all those 
values greater than or equal to some number 7 are put together and the probability PY >) is found 
from: 


P(X =n) =1-P(X¥ sn-1) 
You choose v equal to the highest value of r for which the observed frequency is > 0. 
In Example 12, 7 is chosen to be 7 since all telephone calls for r = 8 have zero frequencies. 
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The frequency f, with which each r occurs is given by P(XY = r) x N. 


The Poisson distribution has a single parameter A, which may be known or which may be estimated 
from the observed data using: 


_ Ur xf) 
i 


There is the usual restriction on the total of the expected frequencies being equal to the total of the 
observed frequencies. 


If A is not estimated by calculation: = number of cells — 1 
If A is estimated by calculation: v = number of cells — 2 


Example 


The numbers of telephone calls arriving at an exchange in six-minute periods were recorded over a 
period of 8 hours, with the following results. 


Number of calls, r 0 1 2 3 4 5 6 7 8 


Frequency, /, 8 19 26 13 7 5 1 1 0 


Can these results be modelled by a Poisson distribution? Test at the 5% significance level. 


Ho: A Foisson distribution Fo(A) is a suitable 
model. 
Hy: The calls cannot be modelled by a Foisson 
distribution. 
Total number of observations = N = Bxse 
= 80 
’ Sa is - r 
6.664 
° iia (0.1106 x 8&0) 
i 0.2438 19.504 
2 0.2681 21.448 
3 O1966 15.726 
4 0.1062 6.656 
5 0.0476 3.608 
6 0.0174 Lege 
7 or more | 0.0075 0.6 
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F 0. E. (O,= 29° 
; E, 

O (e) 6.664 0.0642 

1 19 | 19.504 0.0130 

Z 26 | 21446 0.9661 

3 13 18.728 0.4732 

4 7 6.656 0.3166 
5 or more ye 5.6 0.2463 


BA 


distribution. 


Exercise (6D) 


1 The following table shows observed values for a distribution which it is thought may be modelled 
by a Poisson distribution. 


pe E)? 
ie 2.1016 


You have 6 — 2 = 4 degrees of freedom. ._ 
From the table on page 192, Xe (5%) = 9.486 
2.1016 < 9.4686 + 


There is not enough evidence to reject H, 
The calls may be modelled by a Foisson 


x 


0 


1 


2 


3 


4 


5 


>5 


Frequency of x 


12 


23 


24 


24 


12 


5 


0 


A possible model is thought to be Po(2). From tables, the expected values are found to be as 


shown in the following table. 
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x 


0 


1 


2 


>5 


Expected frequency of x 


13.53 


27.07 


21: 


07 


18.04 


9.02 


3.61 


1.66 


a Conduct a goodness-of-fit test at the 5% significance level. 


b Itis suggested that the model could be improved by estimating the value of / from the 


observed results. What effect would this have on the number of constraints placed upon the 


degrees of freedom? 


2 A mail-order firm receives packets every day through the mail. 


They think that their deliveries are uniformly distributed throughout the week. Test this assertion, 
given that their deliveries over a four-week period were as follows. Use a 0.05 significance level. 


Day 


Mon 


Tues 


Wed 


Thurs 


Fri 


Sat 


Frequency 


15 


23 


19 


20 


14 


11 
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() 3 Over a period of 50 weeks the numbers of road accidents reported to a police station were 


as shown. 
Number of accidents 0 1 2 3 
Number of weeks 15 13 9 13 


a Find the mean number of accidents per week. 


b Using this mean and a 0.10 significance level, test the assertion that these data are from a 
population with a Poisson distribution. 


() 4 A marksman fires 6 shots at a target and records the number r of bullseye hits. After a series 
of 100 such trials he analyses his scores, the frequencies being as follows. 


r 0 1 2 3 + 5 
Frequency 0 26 36 20 10 6 2 


a Estimate the probability of hitting a bullseye. 


b Use a test at the 0.05 significance level to see if these results are consistent with the 
assumption of a binomial distribution. 


©) 5 The table below shows the numbers of employees, in thousands, at five factories and the numbers 
of accidents in 3 years. 


Factory A B Cc D E 
Employees (thousands) + 3 5 1 2 
Accidents 22 14 25 8 12 


Using a 0.05 significance level, test the hypothesis that the number of accidents per 1000 
employees is constant at each factory. (6 marks) 


6 Ina test to determine the red blood cell count in a patient’s blood sample, the number of cells 
in each of 80 squares is counted with the following results. 


Number of cells per square, x 0 1 2 3 4 5 6 7 8 
Frequency, f 2, 8 15 18 14 13 7 3 0 


It is assumed that these will fit a Poisson distribution. Test this assertion at the 0.05 
significance level. (10 marks) 


7 A factory has a machine. The number of times it broke down each week was recorded over 
100 weeks with the following results. 


Number of times broken down 0 1 2 3 4 5 
Frequency 50 24 12 9 5 0 


It is thought that the distribution is Poisson. 
a Give reasons why this assumption might be made. (2 marks) 
b Conduct a test at the 0.05 level of significance to see if the assumption is reasonable. (8 marks) 
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© 8 Ina lottery there are 505 prizes, and it is assumed that they will be uniformly distributed 
throughout the numbered tickets. An investigation gave the following: 


Ticket 11000 1001— | 2001— | 3001— | 4001— | 5001— | 6001— | 7001-— | 8001— | 9001- 
number 7 2000 3000 4000 5000 6000 7000 8000 9000 | 10000 
Frequency 56 49 35 47 63 58 44 52 51 50 


Using a suitable test with a 0.05 significance level, and stating your null and alternative 
hypotheses, see if the assumption is reasonable. (6 marks) 


© 9 Data were collected on the numbers of female puppies born in 200 litters of 8 puppies. It was 
decided to test whether or not a binomial model with parameters n = 8 and p = 0.5 is a suitable 
model for the data. The following table shows the observed frequencies and the expected 
frequencies, to 2 decimal places, obtained in order to carry out this test. 


Number of females Observed number of litters Expected number of litters 
0 1 0.78 
1 9 6.25 
2 27 21.88 
3 46 R 
4 49 S 
5 35 T 
6 26 21.88 
7 5 6.25 
8 2 0.78 
a Find the values of R, S and 7: (3 marks) 
b Carry out the test to determine whether or not this binomial model is a suitable one. 
State your hypotheses clearly and use a 5% level of significance. (5 marks) 


An alternative test might have involved estimating p rather than assuming p = 0.5. 
c Explain how this would have affected the test. (2 marks) 


© 10 A random sample of 300 football matches was taken and the numbers of goals scored in each 
match was recorded. The results are given in the table below. 


Number of goals 0 1 2 3 4 5 6 7 
Frequency 33 55 80 56 56 11 5 4 
a Show that an estimate of the mean number of goals scored in a football match 
is 2.4 and find an estimate of the variance. (3 marks) 
It is thought that a Poisson distribution might provide a good model for the number of goals 
per match. 
b Give one reason why the observed data might support this model. (1 mark) 


Using a Poisson distribution, with mean 2.4, expected frequencies were calculated as follows: 


Number of goals 0 1 2 3 4 5 6 7 
Expected frequency Ss 65.3 t 62.7 37.6 18.1 7.2 2.5 
c Find the values of s and ¢. (2 marks) 
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d State clearly the hypotheses required to test whether or not a Poisson distribution provides 
a suitable model for these data. (1 mark) 


In order to carry out this test, the class for 7 goals is redefined as 7 or more goals. 

e Find the expected frequency for this class. (1 mark) 
The test statistic for the test in part d is 15.7 and the number of degrees of freedom used is 5. 

f Explain fully why there are 5 degrees of freedom. (1 mark) 


g Stating clearly the critical value used, carry out the test in part d, using a 5% level of 
significance. (3 marks) 


©) 11 A student of botany believed that a certain species of wild orchid plants grow in random 
positions in grassy meadowland. He recorded the number of plants in one square metre of 
grassy meadow, and repeated the procedure to obtain the 148 results in the table. 


Number of plants 0 1 2 3 4 5 6 7 or greater 
Frequency 9 24 43 34 21 15 2 0 
a Show that, to two decimal places, the mean number of plants in one square metre 
is 2.59. (2 marks) 
b Give a reason why the Poisson distribution might be an appropriate model for 
these data. (1 mark) 


Using the Poisson model with mean 2.59, expected frequencies corresponding to the given 
frequencies were calculated, to two decimal places, and are shown in the table below. 


Number of plants 0 1 2 3 4 5 6 7 or greater 
Expected frequencies 11.10 | 28.76 Ss 32.15 | 20.82 | 10.78 | 4.65 t 
c Find the values of s and ¢ to two decimal places. (2 marks) 


d Stating clearly your hypotheses, test at the 5% level of significance whether or not this 
Poisson model is supported by these data. (5 marks) 


[6.5 ) Using contingency tables 


So far in this chapter you have considered the frequency with which a single event occurs. For example, 
you might count the number of times each of the numbers 1 to 6 appears when a dice is thrown 100 
times. Sometimes however, we may be interested in the frequencies with which two criteria are fulfilled 
at the same time. If you study the frequency with which A-Level Maths passes at grades A, B and C 
occur you may also be interested in which of two schools the students attended. Here you have two 
criteria: the pass level and the school. You can show these results by means of a contingency table, 
which shows the frequency with which each of the results occurred at each school separately. 


Pass (criterion 1) 18 students at school XY got a grade A pass. 
A B C Totals 7 
. 32 students at school Y got a grade C pass. 
School X 18 12 20 50 
(Cileenon2) Y 26 12 32 70 __ Atotal of 44 students out of a total of 120 
Totals 44 | 24 | 52 | 120 gota piade /pass 


This is called a 2 x 3 contingency table since there are two rows and three columns. 
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Setting the hypotheses 


What we are interested in is whether there is any association between the two schools’ sets of results. 
We pose the hypothesis ‘are the two criteria independent?’ 


Ho: School and grade of pass are independent. 
H,: School and grade of pass are not independent. 


Selecting a model 


If ne SUIBOE IS =I Hy is true then you would expect school ¥ to get 2 a0 2 of each grade and school Y to 
get — io o of each grade. 


Now, overall: P(A grade) = 75 


P(school_X) = = 


So 
P(A grade and school X) = P(A grade) x P(school_X) = atx xo 
The expected frequency of passes at A from school X is therefore 
44 _ 50 _ 44x 50 _ 
120 x 120 x 120 ="1990 = 18.33 
Notice that: 


row total x column total 
grand total 


= Expected frequency = 


The expected frequency is calculated on the assumption that the criteria are independent. You can 
find the other expected frequencies in the same way. These are shown in the table. 


Pass (criterion 1) 
A B C Totals 
50 x 44 50 x 24 50 x 52 
X = 18.33 = 10 = 21.67 
School 120 120 120 a 
(criterion 2) 70 x 44 70 x 24 70 x 52 
Y 120. - = 25.67 120 - 14 20. 30.33 70 
Totals 44 24 52 120 
Degrees of freedom 
When calculating expected values you need not calculate the This creates one 
last value in each row because the sum of the values in each ————————_ constraint on the number 
row has to equal the row total. of degrees of freedom. 


For example, the expected frequency of students who obtain a 
grade C from school X would be 50 — (18.33 + 10) = 21.67 


In the same way the last value in each of the columns is fixed by This creates another 


the column total once the other values in the column are known. constraint on the number 
. of degrees of freedom. 
For example, the expected frequency of students who obtain a 


grade A from school Y would be 44 — 18.33 = 25.67. 
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In general, if there are i rows, then once (/ — 1) expected frequencies have been calculated the last 
value in the row is fixed by the row total. If there are & columns, once (A — 1) columns have been 
calculated the last column value is fixed by the column total. 


The number of independent variables is given 
therefore by (h — 1)(A — 1). That is to say: 


= The number of degrees of freedom 
v= (h-1)(k-1) 


Example 


t Watch out ] If the expected frequency in any 


column is <5, you will need to combine columns. 
Make sure you use the new number of columns 
after combining as your value of k when working 
out the number of degrees of freedom. 


Conduct a goodness-of-fit test, at the 5% significance level, on the data given on pages 113-114 for 


the two schools X and Y. 


Ho: School and grade of pass are independent. 


H,: School and grade of pass are not independent. 


y=(h-1)(k - 1) = (2 -|(3 -l)= 2° 


From tables the critical value at the O.05 


significance level is 5.991.+ 


(0; - E)? 
O; E; ~ 
16 18.33 0.0059 
12 10.00 0.4000 
20 21.67 01287 
26 20S 0.0042 
IZ 14.00 0.2657 
oe 30.53 0.0920 


i an 
E, =O0.9165° 


(O; — E,)* 
So: ae <5:90)* 


Do not reject Ho: there is insufficient evidence 
to suggest an association between the school 
and the grades of pass. 

School and grade of pass are independent. 


Example 


During the trial of a new drug, 60 volunteers 
out of 200 were treated with the drug. 


Those who experienced relief of their symptoms and 


those who did not were recorded as in the table. 


Relief | Norelief | Totals 
Treated 10 50 60 
Not treated 40 100 140 
Totals 50 150 200 


Use a suitable test to see if there is any association between treatment with the drug and relief of 


symptoms. Use a 5% significance level. 
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Ho: Treatment and relief are independent 

Hy: Treatment and relief are not independent 

(associated). 

Table of expected values: 


Relief No relief 
60 x 50 GO x 150 
Treated ~ 300 _ =15 ~ 300 = 45 + 
Not 140 x 50 140 x 150 
Se eS —— = = 
treated 200 eo 200 vo 


ve (2=— 12s oT 


= 


From the table on page 192 the critical value 
XF (5%) i5 3.641. 


0; E; 


(O; — E)? 
10 Ge) 1.6667 
4O 35 0.7143 
100 105 0.2381 
Se 
E, < 3.641 
So you do not reject Ho. There is no reason 


E. 
50 45 0.5556 
(0; - E)? 
en = 3.1747 - 
2 


to believe there is an association between — 


sb 


treatment and relie 


Exercise 


! 


1 When analysing the results of a 3 x 2 contingency table it was found that 


6 
(0;- EP 
SOB 
i=1 : 


Write down the number of degrees of freedom and the critical value appropriate to these data 
in order to carry out a x” test of significance at the 5% level. 


2 Three different types of locality were studied to see if the ownership, or non-ownership, of a 


(0;- EY 
television was or was not related to the locality. Pee was evaluated and found to be 13.1. 


Using a 5% level of significance, carry out a suitable test and state your conclusion. 
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In a college, three different groups of students sit the same examination. The results of the 


examination are classified as Credit, Pass or Fail. In order to test whether or not there is an 
-_ 2 


oe (Opa EY. 
association between the group, and exam results, the statistic Saar ae is evaluated and 
found to be equal to 10.28. 


a Explain why there are 4 degrees of freedom in this situation. 
b Using a 5% level of significance, carry out the test and state your conclusions. 


The grades of 200 students in both Mathematics and English were studied with the following 
results. 


English grades 
A B C 
A 17 28 18 
Maths grades B 38 45 16 
C 12 12 14 


Using a 0.05 significance level, test these results to see if there is an association between 
English and Mathematics results. State your conclusions. (6 marks) 


The number of trains on time and the number of trains that were late were observed at three 
different London stations. The results were: 


Observed frequency 
On time Late 
A 26 14 
Station B 30 10 
C 44 26 


Using the 7 statistic and a significance test at the 5% level, decide if there is any association 
between station and lateness. (6 marks) 


In addition to being classed into grades A, B, C, D and E, 200 students are classified as male or 
female and their results are summarised in a contingency table. 


ae (O;- Ej? 
Assuming all expected values are 5 or more, the statistic Le was 14.27. 


Stating your hypotheses and using a 1% significance level, investigate whether or not gender and 
grade are associated. 


In arandom sample of 60 articles made in factory A, 13 were defective. In factory B, 12 out of 
40 similar articles were defective. 


a Draw up a contingency table. 


b Test at the 0.05 significance level the hypothesis that quality was independent of the factory 
involved. 
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8 During an influenza epidemic, 15 boys and 8 girls became ill out of a year group of 22 boys 
and 28 girls. Assuming that this group may be treated as a random sample of the age group, 
test at the 5% significance level the hypothesis that there is no connection between gender and 
susceptibility to influenza. 


©) 9 Ina study of marine organisms, a biologist collected specimens from three beaches and 
counted the number of males and females in each sample, with the following results: 


Beach 
A B C 
Male 46 80 40 
Gender 
Female 54 120 160 


Using a significance level of 5%, test these results to see if there is any association between 
the choice of beach and the gender of the organisms. (6 marks) 


@) 10 A research worker studying the ages of adults and the number of credit cards they possess 
obtained the results shown below: 


Number of cards 


= 3 >3 
= 30 74 20 

Age 
> 30 50 35 


Use the x? statistic and a significance test at the 5% level to decide whether or not there 
is an association between age and number of credit cards possessed. (6 marks) 


@) 11 Members of four local gyms were surveyed to find out if they had injured themselves while 
working out in the last month. The results are summarised in the table below: 


Injured 15 4 8 7 34 
Uninjured 222 254 167 188 831 
Total 237 258 175 195 865 


A test is carried out at the 5% significance level to determine whether or not there is an 
association between injuries and choice of gym. 


a State the null hypothesis for this test. (1 mark) 
b Show that the expected frequency of members injured at gym C is 6.88 (1 mark) 


c Calculate the test statistic for this test, and state with reasons whether or not the null 
hypothesis is rejected. (5 marks) 
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12 Millie wants to investigate whether students who studied different sciences at university get paid 
the same when they get a job. She surveys science graduates who graduated from university 
in the last 5 years, and records their salary information. The results are recorded in the table 


below. 
Salary 
£0-£20k | £20k-£40k | £40k-£60k | £60k-£80k | >£80k Total 
Biology 4 69 23 5 3 104 
Science | Chemistry 3 72 27 4 2 108 
studied | Physics 2 68 32 5 4 ii 
Total 9 209 82 14 9 323 


She tests at the 5% significance level whether there is an association between the science studied 
and pay. 


a State the null and alternative hypotheses for Problem-solving 
(1 mark) 


this test. 
If any of the expected frequencies are 


b Calculate the test statistic for this test, and less than 5 you have to pool columns 
state with reasons whether or not the null before calculating X7. 
hypothesis is rejected. (5 marks) 


6.6 | Apply goodness-of-fit tests to geometric distributions 


Recall that the conditions under which a geometric distribution is likely to arise are: 


e Trials have two outcomes: success and failure 
e Trials are independent 
e Trials are performed until the first success 
e The probability of success on each trial is constant (p) 
e The measured quantity is the number of trials until the first success 
For a geometric distribution with probability of success p we have: 
P(X=r) =p(1 —p)’-1, forr=1, 2, 3, ... 
As with the Poisson distribution, there is an infinite number of possible values for 7, so we once again 
group together all those values greater than or equal to some cut-off, n, which can be chosen to be 
the largest value of r for which the observed frequency is non-zero. 


If we take a sample of size N from a geometric random variable Y, then the expected frequency of 
observations of the value r is f, = P(X =r) x N. 
The geometric distribution has a single parameter p which can be estimated from the observed 


frequencies /, using the formula: 
Because each success represents one observation 


Total number of successes N from the distribution, the total number of successes 


Total number of trials Sx f, is equal to the total frequency, N. 
L_________ Each observation of r from the distribution 
contributes r trials to the total. 


As before, if you estimate the parameter p from the 
observed frequencies, then that is a constraint, and so reduces the number of degrees of freedom by 1. 
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Example 


Sarah has a large DVD collection. Every week she picks DVDs off the shelf at random until she 
finds one that she would like to watch. Sarah thinks that there is about a 50% chance she will be 

in the mood to watch any particular DVD. Over the course of a year she records the number of 
DVDs she picks off the shelf before finding one she would like to watch. The results are recorded in 
the frequency table below. 


Number of DVDs 1 2 3 4 Total 
Observed frequency O; 33 12 5 2, 52 


a Calculate the expected frequencies if the number of DVDs considered is modelled as a Geo(0.5) 
random variable. 


Sarah wants to check if her guess that there is a 50% chance she’ll watch any particular DVD is 
supported by the data. 


b Formulate the null and alternative hypotheses. 
c Is Sarah right in her assumption? Test at the 5% significance level. 


a The expected frequencies are: Problem-solving 


The expected frequency is 


ain aes ae ee Teudetie 5 
DVDs 2 calculated by i 
E,=P(X=i)xN= (5) x 52, 
Observed 33/121 5 D 52 i 2 
frequency Oj Since there are no observations 
Expected 6/13 les] cs 52 greater than . we estilo ids all 
frequency E; larger values into a single 
b Ho: X ~ Geo(O.5) is a suitable model. COSINE, Tile Sp eile . 
number of observations = 4 is 
Hy: X ~ Geo(O.5) is not a suitable model. 1 
c We calculate the goodness of fit. P(X > 4) x N=3 x 52 
caalinglies 1 2 C >4 | Total 
DVDs 
Observed 33 12 5 2 52 
Frequency O,; 
os 26 13 65 | 65 | 82 
Frequency E; 
(O; - E)? 
oe an 1.66846 |0.0769 |0.3462| 3.1154 |5.4231 


fi 


So X* = 5.4231. We can model X* with a y§ random 
variable. 


The critical value at the 5% significance level is 

x5 (5%) = 7.615. Since 5.4231 < 7.815, we do not have 
enough evidence to reject the null hypothesis, so we can 
model the number of DVDs by a geometric random variable 
with p = O.5. 
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Exercise (6F) 


A 
1 


) 


The following table shows observed values for what is thought to be a geometric distribution 
with p = 0.6. 


k 1 2 3 + 5 6 Total 
Observed frequency O, | 207 66 13 9 3 2 300 


Calculate the expected frequencies and, using a 1% significance level, conduct a 
goodness-of-fit test. (6 marks) 


The following table shows observed values for what is thought to be a geometric distribution 
with p = 0.4. 


k 1 2 3 + 5 6 Total 
Observed frequency O, 42 26 10 8 10 4 100 


Calculate the expected frequencies and, using a 5% significance level, conduct a 
goodness-of-fit test. (6 marks) 


The following table shows observed values from a distribution which is thought to be modelled 
by a geometric distribution. 


k 1 2 3 4 5 6 Total 
Observed frequency O, 61 24 11 1 2 1 100 


a Use the observed data to estimate p (to 3 d.p.).. (2 marks) Problem-solving 
b Conduct a goodness-of-fit test at the 5% significance 


N : 
level to determine whether a geometric distribution is Use SG to estimate the 
a good fit for the data. (5 marks) parameter. 


Each day after work Katie flags down a taxi to take her home. She records how many taxis she 
tries to flag down before one stops, over the course of 100 days. 


Attempts 1 2 3 4 5 Total 
Frequency 76 17 4 2 1 100 


Katie thinks she can model the number of attempts each day using a geometric random 

variable X¥ ~ Geo(p). 

a Using the observed frequencies, find an estimate for p (to 3 d.p.). (2 marks) 

b Conduct a goodness-of-fit test at the 2.5% significance level, and say whether a geometric 
random variable is a good model for the data. (5 marks) 


Michael has a pet monkey and wants to test the theory that, given a typewriter and enough 
time, the monkey will eventually type out the complete works of Shakespeare. Unfortunately 
the experiment is a total disaster, and the monkey has succeeded only in producing a seemingly 
random string of alphabetic characters! 
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Michael decides instead he will see how random the string of characters is. He reads the string 
looking for vowels. Each time he finds a vowel, he counts the number of letters until the next 
vowel. The results are recorded in the table below. 


Characters to next vowel 1 2 3 + 5 6 7 8 or more | Total 
Frequency 12 14 11 10 5 9 7 7 75 
Michael believes that every letter of the alphabet has an equal chance of being typed by the 
monkey. 
a Assuming Michael’s belief is accurate, suggest a suitable distribution to model the 
number of letters typed before the next vowel. (2 marks) 
b Conduct a goodness-of-fit test, at the 5% significance level, to see if Michael’s belief 
is supported by the data. (5 marks) 
c Describe one limitation of using this experiment to test Michael’s belief. (1 mark) 


Challenge 


Ellen is trying to make some money, so on the weekends she runs a lemonade 
stand. Each day she makes 10 cups of lemonade and counts how many people 
walk past before she sells all of them. The results are recorded in the table below. 
Number of people 10 11 12 13 14 15 
Frequency 10 25 29 15 15 10 


You may assume that every passer-by has an equal probability of buying a cup 
of lemonade, and that the probabilities do not change from day to day. Suggest 
an appropriate distribution to model the number of people that go past before 
Ellen sells out, and test the fit of your model at the 5% significance level. 


Mixed exercise 6) 


1 The random variable Y has a x? distribution with 10 degrees of freedom. Find y such that 
P(Y <7) =0.99. 


2 The random variable X has a chi-squared distribution with 8 degrees of freedom. Find x such 
that PLY > x) = 0.05. 


3 As part of an investigation into visits to a Health Centre, a 5 x 3 contingency table was 
constructed. A x? test of significance at the 5% level is to be carried out on the table. 


Write down the number of degrees of freedom and the critical region appropriate to this test. 


4 Data are collected in the form of a4 x 4 contingency table. 
To carry out a x? test of significance one of the rows is amalgamated with another row and 


O- EY) 
the resulting value of S- co was calculated. 


Write down the number of degrees of freedom and the critical value of x* appropriate to this 
test, assuming a 5% significance level. 
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© 5 A new drug to treat the common cold was used with a randomly selected group of 100 
volunteers. Each was given the drug and their health was monitored to see if they caught a 
cold. A randomly selected control group of 100 volunteers was treated with a dummy pill. 


The results are shown in the table below. 


Cold No cold 
Drug 34 66 
Dummy pill 45 55 


Using a 5% significant level, test whether or not the chance of catching a cold is affected 
by taking the new drug. State your hypotheses carefully. (6 marks) 


6 Breakdowns on a certain stretch of motorway were recorded each day for 80 consecutive days. 
The results are summarised in the table below. 


Number of breakdowns 0 1 2 >2 
Frequency 38 32 10 0 


It is suggested that the number of breakdowns per day can be modelled by a Poisson 
distribution. 


Using a 5% significant level, test whether or not the Poisson distribution is a suitable 
model for these data. State your hypotheses clearly. (9 marks) 


&) 7 Asurvey in a college was commissioned to investigate whether or not there was any association 
between gender and passing a driving test. A group of 50 males and 50 females were asked 
whether they passed or failed their driving test at the first attempt. All the students asked had 
taken the test. The results were as follows. 


Pass Fail 
Male 23 27 
Female 32 18 


Stating your hypotheses clearly test, at the 10% level, whether or not there is any evidence of an 
association between gender and passing a driving test at the first attempt. (6 marks) 


8 Successful contestants ina TV game show were allowed to select from one of five boxes, four of 
which contained prizes, and one of which contained nothing. The boxes were numbered | to 5, 
and, when the show had run for 100 weeks, the choices made by the contestants were analysed 
with the following results: 


Box number 1 2 3 4 5 
Frequency 20 16 25 18 21 


a Explain why these data could possibly be modelled by a discrete uniform distribution. 


b Using a significance level of 5%, test to see if the discrete uniform distribution is a good 
model in this particular case. 
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A pesticide was tested by applying it in the form of a spray to 50 samples of five flies. 
The numbers of dead flies after 1 hour were then counted with the following results: 


Number of dead flies 0 1 2 3 + 5 
Frequency 1 1 5 11 24 8 
a Calculate the probability that a fly dies when sprayed. (2 marks) 


b Using a significance level of 5%, test to see if these data could be modelled by a 
binomial distribution. (5 marks) 


The number of accidents per week at a certain road junction was monitored for four years. 
The results obtained are summarised in the table. 


Number of accidents 0 1 2 >2 
Frequency 112 56 40 0 


Using a 5% level of significance, carry out a x? test of the hypothesis that the number 
of accidents per week has a Poisson distribution. (9 marks) 


Samples of stones were taken at two sites on a beach which were | mile apart. The rock 
types of the stones were found and classified as igneous, sedimentary or other types, with the 
following results. 

Site 


Igneous 30 10 


Rock type | Sedimentary 55 35 
Other 15 15 


A scientist believes that the distribution of rock types at site A can be used as a model 
for the distribution at site B. Test this belief, using a 5% significance level. (6 marks) 


A small shop sells a particular item at a fairly steady yearly rate. When looking at the weekly 
sales it was found that the number sold varied. The results for the 50 weeks the shop was open 
were as shown in the table. 


Weekly sales 0 
repens ope pa pe pope pt pe pepe 
a Find the mean number of sales per week. (2 marks) 


b Using a significance level of 5%, test to see if these can be modelled by a Poisson 
distribution. (8 marks) 


A study was done of how many students in a college were left-handed and how many were 
right-handed. As well as left- or right-handedness the gender of each person was also recorded 
with the following results. 

Left-handed Right-handed 
Male 100 600 
Female 80 800 


Use a significance test at the 0.05 level to see if there is an association between gender 
and left- and right-handedness. (6 marks) 
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©) 14 A school science department collected data on which science subject students found the most 
interesting. A random sample of 300 students gave the following results. 


Subject 
Physics Biology Chemistry 
Male 74 28 68 
Gender 
Female 45 40 45 


A test is carried out at the 1% level of significance to determine whether or not there is an 


association between gender and preferred subject. 


a State the null hypothesis for this test. (1 mark) 
b Show that the expected frequency for females choosing biology is 29.47 (to 2 d.p.).. (1 mark) 
c Calculate the remaining expected frequencies, and the test statistic for this test. (3 marks) 
d State whether or not the null hypothesis should be rejected. Justify your answer. (2 marks) 
e Would the test be rejected if instead the test were carried out at the 5% level of 

significance? (1 mark) 
The discrete random variable X follows a Poisson distribution with mean 2.15. 
a Write down the values of: 

i P(X¥=1) 

li P(X > 2) (2 marks) 


The manager at a call centre recorded the number of calls coming in each minute between noon 
and | pm. 


Number of calls 0 1 2 3 4 5 6 Total 
Frequency 10 12 14 12 8 3 1 60 
b Show that the average number of calls received in a minute is 2.15. (1 mark) 


The manager believes that the Poisson distribution may be a good model for the number of 
calls arriving each minute. She uses a Poisson distribution with mean 2.15 to calculate expected 
frequencies as follows. 


Number of calls 0 1 2 3 4 5 6 or more 
Expected frequency 6.99 | 15.03 a 11.58 | 6.22 2.67 b 
c Find the values of a and 4 to two decimal places. (2 marks) 


The manager will test, at the 5% level of significance, if the data can be modelled by a Poisson 
distribution as she suspects. 


d State the null and alternative hypotheses for this test. (1 mark) 
e Explain why the last two cells in the expected frequency table should be combined 
when calculating the test statistic for this test. (1 mark) 


f Calculate the test statistic and state the conclusion for this test. State clearly the critical 
value used in the test. (4 marks) 
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David doesn’t have a car. If it’s raining in the morning when he has to go to work, he calls his 
friends one by one to see if anyone can give him a lift. Since starting his job it has rained on 255 
mornings. David has recorded the number of calls he had to make on each of these mornings 
before finding a willing driver. 


Number of calls 1 2 3 4 5 6 7 Total 
Frequency 130 54 24 28 13 5 1 255 


Assume that each of David’s friends is equally likely to offer him a lift, and that David will 
never run out of friends to call. (David is extremely popular.) 


a Suggest a suitable distribution to model the number of calls made by David on 


a rainy morning. (1 mark) 
b Use the observed data to estimate any parameters necessary for your chosen 

distribution. (2 marks) 
c Carry out a goodness-of-fit test at the 5% significance level for your chosen 

distribution. (6 marks) 


Wilfred calls his parents every weekend to tell them about his week. Unfortunately Wilfred is 
very forgetful, and can’t remember if the last digit of his parents’ phone number is 2, 5 or 7. 
When he wants to call his parents, he simply guesses the last number, and waits to see if his 
parents answer. If they don’t he tries again. 


The number of times Wilfred dials before he gets through to his parents throughout the year is 
recorded in the table below. 


Number of attempts 1 2 3 4 7 Total 
Frequency observed 27 10 10 4 1 52 


Wilfred claims that the number of attempts made each time he calls his parents can be 
modelled as a geometric distribution with probability of success 4. 


a Give one criticism of Wilfred's model. (1 mark) 
b Test Wilfred's claim at the 5% significance level. (6 marks) 


Challenge 


A random sample of 500 phone calls to a call centre revealed the 
following distribution of call length (in minutes). 


Length of QOS/<5 | SS/< 10 | 1S/<15|ISsS/< 20 | 20S //< 25 
call 


Frequency 1 63 221 177 32 


a Estimate the mean and variance of the call lengths. 


b Using the mean and variance calculated in part a, test at the 5% 


level of significance whether call length can be modelled by a normal 
distribution. 
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Summary of key points 


1 The null and alternative hypotheses generally take the following form: 
Ho: There is no difference between the observed and the theoretical distribution. 
H,: There is a difference between the observed and the theoretical distribution. 


Goodness of fit is concerned with measuring how well an observed frequency distribution 
fits to a known distribution. 


O,- E 
The measure of goodness of fit is rey ee a ea Fo 


The x¢ family of distributions can be used to approximate X° as long as none of the expected 
values is below 5. 


When calculating degrees of freedom: 


v = number of cells after combining — number of constraints 


When using chi-squared tests, if any of the expected values are less than 5, then you have to 
combine frequencies in the data table until they are greater than 5. 


When selecting which of the x2 family to use as an approximation for X%, you have to select 
the distribution which has v equal to the number of degrees of freedom of your expected 


values. 


If X* exceeds the critical value, it is unlikely that the null hypothesis is correct so you reject it 
in favour of the alternative hypothesis. 


If nis the number of cells after combining: 


eae Degrees of freedom 
Distribution 
Parameters known Parameters not known 
Discrete uniform n—-1 
Binomial n-1 n-2 
Poisson n-1 n-2 
Geometric n-1 n-2 


10 For contingency tables: 


row total x column total 
grand total 


expected frequency = 


for anh x k table, the number of degrees of freedom v = (h— 1)(k - 1) 
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After completing this chapter you should be able to: 
e@ Understand the use of probability generating functions — pages 129-131 
e@ Use probability generating functions for standard distributions 
— pages 132-135 
e Use probability generating functions to find the mean and variance of a 
distribution — pages 135-138 


@ Know the probability generating function of the sum of independent 
random variables — pages 139-142 


X ~ Po(4.5). Find: 
a P(X =5) b P(X <4) 
@ [Oo d Var(X) € Chapter 2 


2 Y~ Geo(). Find: 
aR (ye=25) b P(Y 2 3) 
c E(Y) d Var(Y) € Chapter 3 


3  W~ Negative B(5, 0.35). Find: 
a P(W= 13) b P(WS 12) 
c E(W) d Var(W) € Chapter 3 


Probability generating functions are used by 
actuaries to calculate risk in order to advise 
insurance companies what premiums to 
charge customers. — Mixed exercise 7, Q5 


4 Find f'(x) for each function: 
a f(x) =(@3+e*)4 


b f(x) =e*(1 — 2x2) © Pure Year 2, Chapter 9 


Probability generating functions 


GRY Probability generating functions 


A probability generating function (p.g.f.) is a mathematical function that stores details of a 
probability distribution. It can only be used with a discrete probability distribution that takes 
non-negative integer values, such as the binomial or Poisson distributions. 


= If a discrete random variable X has probability 
mass function P(X = x), then the probability 
generating function of X is given by 


Gy(t) = )-P(X = x) where ¢ is a dummy variable. 


Hint ) You sum ¢*P(X = x) over all possible 


values in the sample space of X. 


You can see how this function works by considering an example: 
The discrete random variable X has the probability distribution shown in the table below: 


x 0 1 2 3 
P(X =x) 0.2 03 0.3 0.2 


The probability generating function of X is 
Gy(t) = 0.279 + 0.371 + 0.324 + 0.283 
where the coefficients of ¢* are the probabilities PLY = x). 


When ¢ = 1, the terms of the generating function add up to 1 since ))P(X¥ = x) = 1. This is an 
important property of any probability generating function. 


= For any probability generating function G,(1) = 1. 
The probability generating function can also be defined in terms of the expectation of a function of 


the random variable. This definition is given in the formulae booklet in your exam. 


= The probability generating function of a discrete t Links ) E(o(X)) = PY = 
random variable X is given by (2X) = Lata) PX= 9, 


Gy (t) = E(t*) so E(*) = )orP(X= x) € Section 1.3 


Example 


X is the discrete random variable that denotes the absolute difference of the scores when two 
fair dice are thrown. Construct the probability distribution of XY and write down the probability 
generating function. 


Use a sample space diagram to find the possible 
outcomes: 

1 2 3 4 i) 6 

1 O 1 z ) 4 2 

2 1 O | 2 3 4 

3 2 i O | 2 2 

4 s 2 | O | 2 

5 4 3 a | O | 

6 5 4 5) 2 1 O 
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Now write down the probability distribution in a 
table: 


x O i 2 iS) 4 S, 


é 10 8 G 4 2 
P(X = x) 36 36 36 36 36 36 


Gy(t) = “P(X = x)t* so 


= Oy 1041 6 42 6 43 4 44 245 
Gy(t) = set t+ act! + agt© + act? + get? + set 


= 53 + St + 402 + 393 + 214 + £9) 


This is the probability generating function for X. 


Check that )>P(¥ = x) = 1 and do not 
simplify each probability. 


The probability generating function of the discrete random variable X is given by 


Gy(t) = k(1 + 0)? 
a Find the value of k. 


b Gy = Fl + 20+ 0?) 
1 1 1 
=at at + rua 


x O i 2 


1 1 1 —— 


P(X = x) 4 2 4 


Exercise 7A) 


b Write down the probability distribution of X. 


Problem-solving 


Use the property that Gy(1) = 1. Substitute ¢ = 1 
into the expression for Gy(¢) and set it equal to 1, 
then solve to find k. 


Expand the right-hand side with k =i 


The x values are the powers of ¢ and the 
probabilities are the coefficients of each term. 


1 The discrete random variable X has probability generating function 


G,(t) = 0.3 + 0.22 + 0.57 
a Write down the sample space of X. 


b Find: 
i P(X=0) ii P(Y= 0) 


2 The discrete random variable X has probability generating function 


Gy) = 701 +93 
a Write down the sample space of X. 


b Find: 
i PX =I) ii P(X = 2) 
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The discrete random variable Y has probability generating function 
Gy(t) = 0.7 + 0.117 + 242) 
Find: 
a P(Y=1)) b P(Y <3) ¢ PG = 7=6) 


X is the discrete random variable representing the score when a fair dice is rolled. Write down 
the probability generating function of YX. 


X is the discrete random variable representing the score on a tetrahedral dice. The dice is biased 
such that PLY = 1) = 0.4. Given that the other three outcomes are equally likely, write down the 
probability generating function of X. 


Find the probability generating function for each of these distributions: 


a P(X=x)=75 x=1,2,3,4 b P(Y¥=x)=~— x=1,2,3 


The probability generating function of a discrete random variable Y is given by 

Gy(t) =k(2+t+ 0?) 
a Find the value of k. (2 marks) 
b Find P(Y= 1). (2 marks) 


The probability generating function of a discrete random variable X is given by 

Gy(t) = k(1 + 2+ 27°) 
a Find the value of k. (2 marks) 
b Write down the probability distribution of Y. (3 marks) 


X is the discrete random variable that denotes the sum of the scores when two fair four-sided 
dice are rolled. 


a Construct the probability distribution for XY. 
b Write down the probability generating function for X. 


A student writes the following probability generating function for a discrete random variable Y 


Gy(t) = O.1(2t + 5t* + 41?) 


Explain why this is not a probability generating function. 


The discrete random variable Y has probability generating function: 
Gy(t) = k(1 + 1)! 
a Find the value of k. 
b Write down the largest value that Y can take and the probability that it takes this value. 
c Find P(Y=5). 
d State the name of the distribution of Y. 
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CE) Probability generating functions of standard distributions 


The standard probability distributions that you have studied have particularly simple probability 
generating functions. These are given in the formulae booklet but you should also be able to derive 
them in simple cases. 


An archer hits the bullseye with probability 0.6. She fires three shots at a target. The random 
variable X represents the number of times she hits the bullseye. 


Find the probability generating function of YX. This is a binomial model with n = 3 
and p = 0.6. You are assuming that 
¥xb6, 06) ij each shot is an independent event. 
AY O 1 - 3 | Write out the probability 
PX¥=x) | O04? | S3*047x06 | 3x04 x Ge? | Oc distribution of X. 


Cx) = DP = : ve as The expansion of Gy(d) is a binomial 
=0.4°+3 x O44 x 0.6t+3 x O04 x 0.64t* + O0.6°t? —— expansion in the form (a+b)" with 


= (0.4 + 0.61)? a=04and b=0.6t. 


= If adiscrete random variable X ~ B(, p) the probability generating function for X is given by 
Gy(t) = (1 - p + pt)” 


The Poisson distribution is theoretically an infinite distribution but the probability generating 
function can be derived using the idea of an infinite series. 


X is a discrete random variable such that t Watch out ) ; ene 

From first principles’ means that 
X'~ Po(1.1). you cannot quote the standard result for the 
Show, from first principles, that the probability p.g.f. of a Poisson random variable given in the 
generating function of X is given by formulae booklet. 


G,(2) = e!e-D 


P(X =x) = tt —_§! Write down the probability distribution of X. 


Gy(t) = OP(X = x) 
_ << AT e!1 js constant so you can write it outside the 


x! summation. 
(aag* 


= a>. xl 


= tify a tay g MO? , 08 Problem-solving 


2! 3! nae P 
The bracketed expression is the Maclaurin 
expansion of e* where x = 1.11. 
< Core Pure Book 2, Chapter 2 
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= If a discrete random variable X ~ Po(A) the probability generating function for X is given by 
Gx(f) =e) 


Example 5) 


A fair tetrahedral dice is rolled until the dice shows a four. The discrete random variable X 
represents the number of rolls up to and including the first instance of a four. 


Show, from first principles, that the probability generating function for XY is given by 
1 


Gy) = 
x(t) = 1-3 
; — The distribution is geometric with parameter <. 
X~ Geo(5) 
eu 
Pea) 405] = | Write down the probability distribution of X. 


< Chapter 3 


= 14 13) 4 28) 4 2) +. 


The bracketed expression is the sum to infinity of 
a geometric series with first term 1 and common 


ratio 3, 
1 1 <€ Pure Year 2, Section 3.5 
ai 3) 
1-34 
ho, This could be simplified to Z = but in the form 
1 9t a given it is clear to see the relationship between 
Gand p. 


= If X is a geometrically distributed discrete random variable X with probability of success in 
any one trial p, the probability generating function for X is given by 


___Pt 
Gx(t) = 1-(1-p)t 


The discrete random variable ¥ ~ Negative B(r, p). Prove, from first principles, that the probability 
generating function of X can be written as 


Gx) ae 


You may quote the following result without proof: 


¥ ("> | Jas-r=— @) where g= 1p. 


x=r\T 
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Write down the probability distribution of X. 


P(X = x) = ta piqt-.x=nrtijrt+2,..4-I | Usel—p=gq. 
x — | ry X — FX c * 3 
Gy(t) = Bh = pra t ___ Split the ¢ term into two parts which match the 
a, Se powers of p and gq. 
=>? rg 
r—-1 
a (pt)’ can be taken out of the summation since it 
x-1 : = 
=¥(* 2 |)pargns-' nt 
SEN yc. ay ENG is independent of x. 
Fae Sel — 3 
= (pt) ale 7 I \(aey : Problem-solving 
= (pt)"(1 — gt)" Since X is an infinite discrete random variable 
ie 5 ; ‘ you are required to find the infinite sum of the 
= (; : a) - (; - 7 ae as required. binomial coefficients. Use the result that 
ee i)@-" = (1- q)~"as given in the 
question. 


= If X is a discrete random variable with negative binomial distribution, the probability 
generating function for X is given by 


_(__ pt’ 
Gx) = F -(1 = 


Exercise 7B) 


1 Write down the probability generating functions Hint | _ 

for the following distributions: - p a ee asks you to ee oe 
or ‘find’ a probability generating function for 

ea, ee) a standard distribution, then you can quote 
ec X~ BG, 0.9) d X ~ Po(3) the standard formulae without proof. 
e X ~ Po(1.7) f Y~ Po(0.2) 
Write down the probability generating functions for the following distributions: 
a X ~ Geo(0.3) b Y ~ Geo(0.8) 
c X ~ Negative B(3, 0.4) d Y~ Negative B(5, 0.9) 


A dice is biased so that P(6) = 0.2. Find the probability generating function of each of the 
following random variables: 


a the number of sixes obtained when the dice is rolled 5 times 
b the number of times the dice must be thrown until it shows a six for the first time 
ce the number of times the dice must be thrown until it shows a six for the second time 


A sail-maker notices that the flaws in a roll of sailcloth occur at an average rate of 0.3 per metre. 
a Suggest a suitable model for the random variable X, the number of flaws in a metre 


of cloth. (1 mark) 
b Find P(X = 1). (1 mark) 
c Write down the probability generating function for X. (2 marks) 
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Probability generating functions 


Bernice is playing darts and she finds that the probability of hitting a treble score in any one 
throw is 0.35. 


a Suggest a suitable model for the random variable X, the number of throws it takes her to hit 


a treble. (1 mark) 
b Find P(X = 6). (1 mark) 
c Write down the probability generating function for X. (2 marks) 


X ~ B(4, 0.8). Show, from first principles, that the probability generating function for X is 
Gy(t) = (0.2 + 0.82)4 (5 marks) 


Calls come in to a call centre at a rate of 3.5 calls per five-minute interval. Given that the 
random variable X is the number of calls that come in during a random five-minute interval 
and that the calls are independent and random, show, from first principles, that the probability 
generating function for_X is 


Gy) = 6250 =D (5 marks) 
Y ~ Geo(0.7). Show, from first principles, that Problem-solving 
th ili ing fi ion for Yi 
& Probably censranng suncnon ice Ws Use the formula for the sum of an infinite 


0.71 (5 marks) convergent geometric series. 


GrO=7_ 03; 


The random variable X ~ B(n, p). Prove, from first principles, that the probability generating 
function of X is given by 


Gy(d) = (1 — p + pt)" (8 marks) 


The random variable X ~ Po(A). Prove, from first principles, that the probability generating 
function of X is given by 


Gy(t) = e4@- D (8 marks) 


The random variable Y ~ Geo(p). Prove, from first principles, that the probability generating 
function of Yis given by 


Gy(t) = (8 marks) 


a 
1-(1—p)t 


GE Mean and variance of a distribution 


You can find the mean and variance of a probability distribution by differentiating the probability 
generating function. 


Gy() = Ele) = So PPX = x) 
Gy @) => xt P(X = x) = EX -4) 
=> G'y(1) = SoxP(X = x) = E(X) 


= If X is a discrete random variable with probability generating function G y(t), 


E(X) = G’y(1) 
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A(t) = > xt*- P(X =x) = Exe 9 
=> G"y() = >ox(x - 1)t*- 2P(X = x) = E(X(X - 1)t*-4) 
=> G"y(1) = Sox(x — 1)P(X = x) = E(X(X = 1) = E(X4) - EX) 
=> EX?) = G"(1) + EX) = G"y(1) + Gy (1) 
You know that Var(X) = E(X2) — (E(X))2. Hence: 


= If X is a discrete random variable with probability generating function G y(t), 


Var(X) = G”y(1) + Gy (1) - (Gy (1))? 
‘ ° . Hint ) You can see from these 
You may be asked to prove these two standard results in your exam. expressions that: 
You can see the above results more clearly by differentiating Gy(t) G(0) = P(X = 0) 
term-by-term: G'(0) = P(X = 1) 


= = = _ 9/72 — n\n G"(0) = P(X = 2) 
If Gy(t) = P(X =0) + P(X = 1)t+ PLX¥ = 2) +...4 P(X =n)" +... mesherst@e OVE RUZ =n 


Then G'y() = P(X = 1) + 2P(X = 2)t + 3P(X=3)2...+nP(X=n)t""! where Gy is the nth 
Hee. derivative of Gy. 

So G’ (1) = )>xP(¥ = x) 

And = Gy” (t) = 2P(Y = 2) + 6P(Y = 3) + 12P(Y = 4) +...+n(n—1)P(X¥=n)t"-*+... 

So G" (1) = Sox(x —1)P(X =x) 


The discrete random variable XY has a probability generating function given by 
1 


_ 25 
Sx = Top 000? + 
Find: a E(XY) b Var(X) 
‘ee Sia : ; 
a Gy(t) = 70000" + (°)4 ——________________._ Use the chain rule to find G’y (i). 
! = 1 4— 
ea" tg0007 = =" —— State the value of E(X) clearly. 
Hence E(X) = 1+ 
, O(1 + 12)(9 + 12) ;— Find the second derivative of Gy(0). 
b Gx) =~“ 000 
Var(X) = G"y(1) + G'y(1) — (G’y(1))? + Write down the formula for variance. 
O(1 + 1)(9 + 1)9 18000 _ 9 
Var(X) = ——--———— + 1-12 = =~. 
ieee soo 2 LL Use the value of G’y(1) calculated above. 


Example 


A discrete random variable X has a probability generating function given by G,(t) = a + bt + ct? 
where a, b and ¢ are constants. Given that the expected value of X is ¢ and the variance of X is =. 
find the values of a, b and c. 


— SoP(¥=x)=1 


Gy(1)=1>a+b+c=1 (1) - 
Giy(t) = b + 2ct 
>Gy(l)=b+2c=2 (2) 


Use the formula for the expected value to write 
an equation in terms of 6 and c. 
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G"y() = 2ce=13c=4- Use the formula for the variance to find c. 


Exercise 70) 


A 


9 


G 


1 A discrete random variable X has probability generating function G,(f) = = + it + $02. 


Use this to find the expected value and variance of Y. 


A discrete random variable X has probability generating function Gy(t) = - + +t + +P + +13, 
Use this to find the mean and standard deviation of Y. 


Three unbiased coins are spun and the number of tails, X, is noted. 
a Write down the probability generating function for_Y. 
b Use your probability generating function to calculate the mean and variance of YX. 


A biased coin is spun four times. If P(head) = 0.6, 

a write down the probability generating function of YY, the discrete random variable 
representing the number of heads spun. 

b Using your probability generating function from a, find: 
i the mean of X¥ 
ii the standard deviation of YX. 


1?(2 + t)4 
A discrete random variable XY has probability generating function Gy(?t) = a 
Find the mean and variance of YX. (8 marks) 
A discrete random variable X has probability generating function Gy(t) = aa 
a Find the mean and standard deviation of X. (6 marks) 


b Find: i P(Y=0) ii PX =1) (2 marks) Hint ) Consider the series expansion of Gy(é). 


A discrete random variable Y has probability generating function Gy(ft) =e" !. 
a Find the mean and variance of Y. (6 marks) 
b Find: i P(Y=0) ii P(Y =2) iii P( Y = 3) iv P(Y =4) (4 marks) 


A bag contains six counters, five red and one yellow. Counters are drawn out and the colour 
noted before being replaced. Let X represent the number of withdrawals until the yellow 
counter is drawn. 


a State the distribution of X. (1 mark) 
b Write down the probability generating function of X. (1 mark) 


c Using your answer to part b, find: 
i the mean of XY ii the variance of X. (7 marks) 
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The discrete random variable Y ~ B(n, p). Use the probability generating function of X to show 
that EY) = np and Var(X) = np(1 — p). 


The discrete random variable XY ~ P(A). Use the probability generating function of XY to show 
that ELY) = Var(X) =. 


A call centre receives incoming calls at an average rate of four per 15-minute period. 
Assuming that the incoming calls are independent and random, 


a state the distribution of X, the number of calls received in a single 15-minute 


period (1 mark) 
b write down the probability generating function of X. (1 mark) 
c Using your answer to part b, show that: 

i the mean of XY is 4 (3 marks) 

ii the standard deviation of X is 2. (4 marks) 


A discrete random variable X has probability generating function G,(t) = a + bt + ct? 
where a, b and c are constants. Given that E(X) = 5 and Var(X) = Zz, find the values of 
a, band c. 


The discrete random variable XY has a probability generating function given by 
Gy(t) = Se where a and + are positive constants. 


Given that the mean of YX is 1.5, find the values of a and b. (6 marks) 


A discrete random variable X has probability generating function G,(t) = k(1 + )* where k 
is a constant. 


a Find the value of k. (1 mark) 
b Write down the probability distribution represented by G. (1 mark) 
c By explicitly using the probability generating function and showing all steps in your 

working, show that E(Y) = 2 and Var(X) = 1. (6 marks) 


Two fair dice are thrown and the random variable X, the smaller of the two numbers, is 
recorded. 


a Find the probability generating function of X. (3 marks) 
b Use your answer to part a to find: 
i the mean of X (3 marks) 
ii the standard deviation of X. (4 marks) 


Challenge 


The random variable X has probability generating function 


given by Gy(t) = 


t 
2 1% 


where k is a positive integer. 


Find: 
a P(Y¥=0) b E(X) in terms of k @ POr= i) 
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@® Sums of independent random variables 


Consider two dice. Dice A has three faces showing the number 1 and three faces showing the 
number 2. Dice B has two faces showing the number 0 and four faces showing the number 1. 


The random variables X¥ and Y represent the scores on dice A and on dice B respectively. 
The probability distributions of the two dice are shown below: 


Dice A: Dice B: 

x 1 2 y 0 1 
1 1 1 

P(X = x) 2 a P(Y=y) 3 5 


The probability generating functions of X and Y are: 
Gy() = 5t +50? and Gy(t) = + 4 


Given that the outcomes on the two dice are independent, the distribution of Z, the random variable 
representing the sum of the scores on the two dice, Z = X + Y, can be worked out: 


g | u 2 3 Z =2 can occur in two ways: X¥ = 2 and Y=0, or 
P(Z =z) IE 5; | 4 X=land Y=1.SoP(Z=2)=3x5+5x$=5 


The probability generating function of Z is: 

Gz() = at +5 +50 
If you find the product of the probability generating functions of X and Y, you will find that the 
resulting function is the same as the probability generating function of Z: 

Gy(d) x Gy(t) = ($2 + $02)($ + 4) 


_i1 
6 


12,12,13 
t+ 3 tct +t 


—1,,12,13 
=i t+st +30 


= If Xand Y are two independent random variables with probability generating function G ,(7) 
and G,(#), the probability generating function of Z = X + Yis given by 


Gz(t) =G x(t) x Gy(A 


You will not need to be able to prove this result in your exam. 


Example 


Two independent discrete random variables X and Y have probability generating functions 
Gy(t) =4+4t and Gy() =445t +20. 

a Find the probability generating function of the random variable Z = X + Y. 

b Use probability generating functions to show that E(Z) = E(Y) + E(Y). 
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= (3 + $i($ + St + Zr?) 
=E+gt+ pt? + ie+ He? + G3 
=G tial t gt? + pot? | 
b G'y(1) = E(X) 

G'y(t) =F > E(X) =F 

G'y(1) = E(Y) 

Gy) =t+H30¥Y)=44+4=2 

G'z(1) = E(Z) 

G7)=34+ +5 SHZ)=s+S4+5 


Use E(X) + E(Y) from part b. 


en 
= 3 aS required. 


Expand and simplify. 


Differentiate the expression from part a. 


A random variable X has a probability generating function G,(f) = 5 + +t. 
a Write down the probability distribution of X. 
b Y=2X +1. Write down the probability distribution of Y and hence find the probability 


generating function of Y. 
c Verify that Gy(t) = tGy/(¢?). 


The probability generating function of Y is 
Gy(t) = St + $18 


© Gy(t) = t(§ + $12) = tGy(t?) as required. 


Use the p.g.f. to write down the probabilities of 
the possible outcomes. 


You can generalise the example above to find the probability generating function for a linear 


transformation of a random variable X. 


= If the discrete random variable X has probability generating 
function G,(¢), then the probability generating function of 
the discrete random variable Y = aX + b, where a and J are 


positive integers, is given by 
Gy (1) = 1°Gy(t*) 
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t Watch out | This result is not 


given in the formulae booklet, 
so you need to learn it. 
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Exercise 7D) 


A 
1 


(E) 3 


(E) 5 


Two independent random variables XY and Y have probability generating functions given by 
Gy(t) =4t +40 +40 and G,(f) =? + 40. 

a Find the probability generating function for the random variable Z = X + Y. 

b Show that E(Z) = E(Y) + E(Y). 


Two independent random variables Y and Y have probability generating functions given by 
Gy(t) = 41 + 0)? and Gy(t) = (2 + 302. 


a Write down the probability distributions of XY and Y. (2 marks) 
b Show that the probability generating function of Z = ¥ + Ycan be written in the form 

G,(t) =a + bt + ct? + dt + et* where a, b, c, dand e are constants to be found. (3 marks) 
c Verify that E(Z) = E(Y) + E(Y). (4 marks) 


X ~ Po(1.3) and Y ~ Po(2.4). 


a Write down the probability generating functions for X and Y. (2 marks) 
b Given that XY and Y are independent, write down the probability generating function 

for Z=X+ Y. (1 mark) 
c Use your answers to parts a and b to show that E(Z) = E(Y) + E(Y). (4 marks) 


Jacintha is rolling a fair six-sided dice until a five appears. 

a Show from first principles that the probability generating function for the number of rolls 
t 
6_ 51 (4 marks) 

Henry rolls a fair ten-sided dice until two fives have appeared. 


required is given by G(f) = 


b Write down the probability generating function for the number of rolls required to 
obtain two fives. (1 mark) 


The random variable Z represents the total number of rolls made on both dice. 
c Find the probability generating function of Z. (2 marks) 
d Show that E(Z) = 26. (4 marks) 


A random variable X has a probability generating function G,(t) = k(1 + 22). 

a Find the value of k. (2 marks) 
b Find P(X = 2). (2 marks) 
c Use the probability generating function to show that E(Y) = 2 and Var(Y) = = (6 marks) 
A second random variable Y has a probability generating function Gy(t) = - + +t. 

Given that X and Y are independent, 

d find E(Y) and write down the value of E(X¥ + Y). (3 marks) 
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A random variable X has a probability generating function Gy(t) = G-» 4 i? 


t 


A second random variable Y has a probability generating function Gy(f) = 3-203 


Given that X¥ and Y are independent, 

a write down the probability generating function for the random variable Z7=X+Y (1 mark) 
b find E(Z) (4 marks) 
c show that Var(Z) = = (6 marks) 


Aidan and Chloe each buy 5 scratchcards for different lottery games. The probability of 
winning a prize on each of Aidan’s scratchcards is 0.3. The probability of winning a prize on 
each of Chloe’s scratchcards is 0.4. 


The random variable X represents the total number of prize-winning scratchcards. 
a Find an expression for the probability generating function of X. (4 marks) 
b Show that the mean of YX is 3.5. (3 marks) 


A random variable X has a probability generating function G,(t) = st + $83, 
Find the probability generating functions for the following random variables: 
a Y=3X b Y=2X+3 ec Y=4xX-5 


A random variable X has probability generating function Gy(t) = +t + ra + kt}, where k 
is a constant. 


a Write down the value of k. (1 mark) 
b Find E(X). (3 marks) 
A random variable Y= 2X - 1. 

c Find the probability generating function of Y. (2 marks) 
d Verify, using your answers to parts b and c, that E(Y) = 2E(X) - 1. (3 marks) 


Challenge 


1 Adiscrete random variable Y = aX + b. Given that Gy(d) = 1°Gy (24), 


show that in general E(Y) = aE(X) + b. 


2 Holly is an archer who hits the bullseye with constant probability 0.6. 
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a Find the probability generating function G(¢) of the number of 
shots she needs until she hits her first bullseye. 

b Show that the probability generating function of the number of 
shots she needs until she hits her second bullseye is (G(2)). 

c Find, in terms of G(f), the probability generating function of the 
number of shots she needs until she hits four bullseyes. 


Probability generating functions 


Mixed exercise © 


A 
1 


(E) 4 


The probability generating function of a discrete random variable Y is given by 

Gy(t) = k(2 + 2t + 31?) 
a Find the value of k. (2 marks) 
b Find P(Y=1). (2 marks) 
A second random variable, X, has probability generating function 

Gy(t) = Gell + + 20)? 
Given that X and Y are independent, find: 
c the probability generating function of Z= X¥ + Y (2 marks) 
d P(Z=2) (4 marks) 


The discrete random variable ¥ ~ Geo(p). Use the probability generating function of X to 


Le 
show that ECY) = j and Var(X) = a (5 marks) 
X ~ B(5, 0.4). Show, from first principles, that the probability generating function for X is: 

Gy(t) = (0.4 + 0.62)> (5 marks) 


A box of cat treats contains 15 treats, 11 meaty and 4 fishy. Fluffy the cat selects a treat at 
random. If the treat is meaty, he spits it back in to the box. Let Y represent the number of 
selections until Fluffy selects a fishy treat. 


a State the distribution of X. (1 mark) 
b Write down the probability generating function of X. (2 marks) 
c Using your answer to part b, find: 
i the mean of X (3 marks) 
ii the variance of X. (4 marks) 


Once Fluffy has selected a fishy treat from the first box, he repeats the process with a second 
box. The second box contains 12 treats, 7 meaty and 5 fishy. The random variable Z represents 
the total number of selections needed by Fluffy to get a fishy treat from both boxes. 


d Write down the probability generating function of Z. (2 marks) 
e Find the mean and standard deviation of Z. (6 marks) 


A car insurance company models the number of claims, X, a particular person will make in one 
year, using a Poisson distribution with mean 0.5. 
a Find, in terms of e, the probability that this person will make: 

i no claims 

ii at least three claims 

in a given year. (5 marks) 
The policy is adjusted so a maximum of 3 claims can be made in any one year. The random 
variable Y represents the number of claims made. 
b Show that the probability generating function of Yis given by 

Gy(t) = 3 + e5(1 +44 + 2 - 22) (3 marks) 
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c Use your probability generating function to find, correct to 3 decimal places, the values of 


E(Y) and Var(Y). (8 marks) 
The discrete random variable XY has probability generating function G,(f) = ae 
a Find the mean and standard deviation of X. (8 marks) 
b Find: 

i P(Y=0) ii P(X¥=2) (2 marks) 
A second discrete random variable Y has probability generating function Gy(t) = an 
c Given that XY and Y are independent, find the probability generating functions for: 

i 2Y-1 ii Z=X+Y (3 marks) 
d Show that E(Z) = 14. (4 marks) 
The probability generating function of a discrete random variable X is given by 

Gy(t) = K(1 + 2t? + 323)? 

a Show that k = x (2 marks) 
b Find P(X = 4). (2 marks) 
c Show that ECY) = ts and find Var(X). (6 marks) 
d Find the probability generating function of 3.X — 2. (2 marks) 


Challenge 


A discrete random variable X has probability generating function 


b 
c 
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Gy(f) = tan(7) 


fh 
Show that: 
i P(X=0)=0 ii P(X=1)=7 iii P(¥=2)=0 
Show that E(X) = Var(X) =5 
Find P(X = 3). 


Probability generating functions 


Summary of key points 


1 


If a discrete random variable X has probability mass function P(X = x), then the probability 
generating function of X is given by Gy(¢) = )/P(X = x)f* where ris a dummy variable. 


For any probability generating function Gy(1) = 1. 


The probability generating function of a discrete random variable X is given by 
Gy() = E(*) 


The probability generating functions for the following standard distributions are given in the 
table below. 


Distribution of X | P(X =x) P.G.F. 


Binomial B(z, p) (ayaxe — p)r-* Gy) = (1—p+pr)” 


oat 


Poisson Po(A) e al G y(t) = et) 
Geometric Geo(p) (1—p)*-1! G (jee 
P) |p-p 2S easy 


Negative binomial é = 1) a4 pt ; 
ip 1 = xT G p= 
Negative B(r, p) ee ee) x(d) Say 


If XY is a discrete random variable with probability generating function G,(d), 
- E(X) = G'y(1) 
> Var(X) = G"y(1) + G'y(1) — (G’y (1)? 


If Y and Y are two independent random variables with probability generating function Gy(t) 
and Gy(¢), the probability generating function of Z = X + Yis given by G7(t) = Gy(t) x Gy (A). 


If the discrete random variable X has probability generating function G,;(¢), then the 
probability generating function of the discrete random variable Y = aX + b, where a and b are 
positive integers, is given by 

Gy(t) = PG (1) 
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Quality of tests 


After completing this chapter you should be able to: 
@ Know about Type | and Type Il errors — pages 147-153 


e@ Find Type! and Type Il errors using the normal distribution 
— pages 153-157 


Calculate the size and power of a test — pages 157-162 


Draw a graph of the power function for a test — pages 162-167 
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» Prior knowledge check 


1 Daily mean temperature in a UK town is modelled as 
X ~ N(p, 2.37). 
The mean of a random sample of 20 recorded mean 
daily temperatures taken in 2015 is 11.1 °C. Test 
whether y is greater than 10°C at the 1% level of 
significance. State your hypotheses clearly. 


- 


Hypothesis tests can sometimes 
lead to incorrect conclusions. You 
can analyse hypothesis tests to 
work out how reliable they are. 

This information is very important 
when using hypothesis testing to 
determine the efficacy of new drugs 
and medical procedures. 

— Exercise 8B, Q5 


€ Statistics and Mechanics Year 2, Chapter 3 
2 Asingle observation is taken from each distribution 
and used to test Hy against H,. Find the critical 
region for each test. 
a X~ Po(4), Ho: 1 = 5 against H,: 4 #5 using a 10% 
level of significance. € Sections 5.1, 5.2 


b Y~ Geo(p), Ho: p = 0.15 against H,: p > 0.15 using 
a 5% level of significance. € Sections 5.3, 5.4 


Quality of tests 


(8.1) Type | and Type Il errors 


When you carry out a hypothesis test, you make an assumption about the distribution of a test 
statistic. You then compare the probability of the observed result occurring with the significance 
level of the test, and decide whether to accept or reject this assumption. This example illustrates a 
hypothesis test based on the parameter, p, of a binomial distribution. 


One rainy day during the summer holidays, a family of four were playing a simple game of cards. 
The game was one of chance so the probability of any particular person winning should have 
been i. After playing a number of games, Robert complained that his younger sister Sarah must 
have been cheating as she kept winning. Their parents quickly intervened and decided to carry out 
a proper investigation and carefully watched the next 20 games. 


Find the critical region for a one-tailed test using a 5% level of significance. 


Ho: p= < Hy: p> 5 
Let X = the number of games Sarah wins out 
of the next 20. 


So X ~ B(20, 4) 
Reject Ho if X = c where P(X = c) < 0.05. 
From tables: 
P(X <= 8) = 0.9591 so P(X = 9) = 0.0409 
P(X = 7) = 0.8982 so P(X = 8) = 0.1018 


So the critical region is X = 2. 


In the example above, if Sarah wins 9 or more games, her parents will reject the null hypothesis, and 
conclude that p > = (or in other words, that Sarah was cheating). It is possible that this conclusion 
will be incorrect. If p = 4, Sarah might still win 9 or more games by chance. The probability of this 
occurring is 0.0409, or the actual significance level of the test. This is called a Type I error. 


= A Type | error is when you reject Ho, but Hy is in fact true. The probability of a Type | error is 
the same as the actual significance level of the hypothesis test. 


It is also possible that Sarah was cheating, but that she still only wins 8 or fewer games. In this case 
her parents would accept the null hypothesis, and conclude incorrectly that p = + This is called a 
Type Il error. 


= A Type Il error is when you accept H,, but H, is in fact false. 


t Watch out ) In order to calculate the probability of a Type Il error 


you would need to know the actual value of the parameter p. 
Because H, is false, you usually don’t have this information. 


147 


Chapter 8 


This table summarises the types of error that can occur in a hypothesis test: 


Truth 
H, is true H, is false 
Accept H OK Type Il error 
Conclusion of test prio YP 
Reject Hy | Type! error OK 


Use the situation in Example 1. 
a Find the probability of a Type I error. 
b If in fact Sarah was cheating and p = 0.35, find the probability of a Type II error. 


Critical region X 2 9 
is true) 


= P(X = 9|X ~ B20, 0.25)) 
= 0.0409 - 


b P(Type Il error) = Placcepting Ho when Ho 


is false) 

Given that p=0.35, 

P(Type Il error) = P(X S ies B(20,035)) 
= 0.7624 


Accidents occurred on a stretch of motorway at an average rate of 6 per month. Many of the 

accidents that occurred involved vehicles skidding into the back of other vehicles. By way of a trial, 

a new type of road surface that is said to reduce the risk of vehicles skidding is laid on this stretch 

of road, and during the first month of operation 4 accidents occurred. 

a Test this result to see if it gives evidence that there has been an improvement at the 5% level of 
significance. 


b Calculate P(Type I error) for this test. 


c If the true average rate of accidents occurring with the new type of road surface was 3.5, 
calculate the probability of a Type II error. 
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a You are dealing with a Poisson distribution. 


Let A = the average number of accidents in a month, and 
X = the number of accidents in any given month, then the 
hypotheses are 

Ho: A= 6 (i.e. no change) 

H,: A< 6 (i.e. fewer accidents) — 
rom tables P(X < 4|A = 6G) = 0.2851. 


his is more than 5% so you do not have enough 


he average number of accidents per month has not 


—— 


F 
r 
evidence to reject Ho 
r 
d 


ecreaseéed. 


b In order to reject Ho you require a value c such that 
PY = clA=6)< 005 + 
From the table on page 191, with A = 6: 


P(X = 2) = 0.0620 
and P(X = 1) = 0.0174. 


So the critical value c is 1, and the critical region for 
this test is XY S 1. 


A Type | error occurs when you reject Ho when it is 


true, and the probability of this happening is 
P(X < 1) = 0.0174. + 


c A Type ll error occurs when you do not have sufficient 
evidence to reject Ho when H, is true. 


If A = 3.5 then Ho is not true. You do not have sufficient 
evidence to reject Ho if X 2 2 so 
Pilyee tl error|A = 3.5) = PLY = 2|4 = 3.5) 

=1-P(X <1|A = 3.5) 

=1=— 01359 

= 0.8641 


You can also calculate the probabilities of errors from a two-tailed hypothesis test. 


A coin is spun 20 times and a head is obtained on 7 occasions. 

a Test to see whether or not the coin is biased. 

b Calculate the probability of a Type I error for this test. 

c Given that the coin is biased and that this bias causes the tail to appear 3 times for each head 
that appears, calculate the probability of a Type IJ error for the test. 
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a The hypotheses are 


Flo2p = 0.5 H,:p#O5 + 


Let XY = the number of heads in 20 spins 


of the coin. 

Assuming Ho is true then X ~ B(20, 0.5). 
For a two-tailed test, at the 5% significance 
P(X S ¢)) S$ 0.025 and P(X = ¢2) S$ 0.025 

(or PX S cp — 1) 2 0.975). 


From tables: P(X S G6) = 0.0577 
and PX = 5) = 0.0207 


50 the value of c; = 5. 


Also: P(X = 14) =1- P(X < 13) 


=1-0.9423 
P(X 2 15) =1- PX S 14) 
=1-0.9793 


= 0.0207 


50 the value of cs = 15. 


Thus the critical region for X is X <5 or Problem-solving 


XxX = 15. P : : 

Notice that since p = 0.5 the two tails are 
As 7 falls between 5 and 15 there is symmetrical about the mean of 10 and the value 
insufficient evidence to reject Ho. of c, could have been inferred from that of c, in 
The coin is not biased. this case. 


b A Type | error occurs when you reject Ho 


but Ho is true, and this occurs when X = 5 
or X= 15. + 
P(Type | error) = P(X S 5|p = 0:5) 
+ P(X = 15|p = 0.5) 


= 0.0207 + 0.0207 
= 0.0414 


c A Type Il error occurs when you do not 
have sufficient evidence to reject Ho when 
H, is true. You do not have evidence to 
reject Ho if X = 6 and X < 14 
ie6 SX 14. 


P(Type Il error) =P6S5 XS 14|p = O25) 
=PXY = 14|p = 0.25) 
— P(X <= 5|p = 0.25) 
= 1.000 — 0.6172 
= 0.3828 
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Example 


Jane knows from experience that 10% of the emails she receives are spam. After her email service 
upgraded the spam filters, she recorded the number of emails sent up to and including the first spam 
email. She wants to test, at the 5% significance level, whether this upgrade improved the spam filter. 


a Find the critical region for her test. 
b Calculate the probability of a Type I error for this test. 


c Given that after the upgrade the probability of an email she receives being spam is now | ina 
100, calculate the probability of a Type II error for the test. 


a Let XY = number of emails sent up to and 
including first spam email 
X ~ Geo(p) —. 
Hoi pH Ol Dep <= Ol 
Assume Ho, so that X ~ Geo(O.1). 
For a one-tailed test you need to find a 
value c 50 that PY = c) S$ 0.05. 
Since P(XY = oc) = (1 — O1)¢-7 = O.9¢- 1, — 
you need an integer c such that 
0.9°-' = 0.05 
log 0.9°-! Slog 0.05 - 
(c — 1)log 0.9 S log 0.05 


¢ log 0.9 < log 0.05 + log 0.9 WEMSIE ITD log 0.9 is negative, so change the 
log0.05 + 10g0.9 direction of the inequality when you divide. 
== 
logO0.9 


c= 29A (3 st.) 


So the critical value is ¢ = 30, and the 


critical region is X 2 30. + 


b A Type | error occurs when you reject Ao 
but Ho is true, and this happens when 


X = 30. 
P(Type | error) = P(X = 30 | p = O11) 
= (1 — 01)30-! 


= 0.0471 (4 d.p.) 
c A Type Il error occurs when you don’t have 
enough evidence to reject Ho and H, is 
true. You do not have enough evidence to 
reject Ho when X S 29. 
P(Type Il error) = P(X < 29 | p = 0.01) 
=1-0.9979 
= 0.2526 (4 d.p.) 
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Exercise & 
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The random variable X is binomially distributed. A sample of 10 is taken, and it is desired to 
test Hy: p = 0.25 against H,: p > 0.25, using a 5% level of significance. 


a Calculate the critical region for this test. 


b State the probability of a Type I error for this test and, given that the true value of p was 
later found to be 0.30, calculate the probability of a Type II error. 


The random variable X is binomially distributed. A sample of 20 is taken, and it is desired to 
test Hy: p = 0.30 against H,: p < 0.30, using a 1% level of significance. 


a Calculate the critical region for this test. 


b State the probability of a Type I error for this test and, given that the true probability was 
later found to be 0.25, calculate the probability of a Type II error. 


The random variable X is binomially distributed. A sample of 10 is taken, and it is desired to 
test Hy: p = 0.45 against H,: p # 0.45, using a 5% level of significance. 


a Calculate the critical region for this test. 


b State the probability of a Type I error for this test and, given that the true probability was 
later found to be 0.40, calculate the probability of a Type II error. 


The random variable X has a Poisson distribution. A sample is taken, and it is desired to test 
Ho: 4 = 6 against H,: 1 > 6, using a 5% level of significance. 
a Find the critical region for this test. 


b Calculate the probability of a Type I error and, given that the true value of A was later found 
to be 7, calculate the probability of a Type II error. 


The random variable X has a Poisson distribution. A sample is taken, and it is desired to test 
Hp: 4 = 4.5 against H,: 1 < 4.5, using a 5% level of significance. 


a Find the critical region for this test. 


b Calculate the probability of a Type I error and, given that the true value of A was later found 
to be 3.5, calculate the probability of a Type II error. 


The random variable X has a Poisson distribution. A sample is taken, and it is desired to test 
Hp: 4 = 9 against H,: A # 9, using a 5% level of significance. 


a Find the critical region for this test. 


b Calculate the probability of a Type I error and, given that the true value of A was later found 
to be 8, calculate the probability of a Type II error. 


The random variable XY is geometrically distributed, and it is desired to test Hp: p = 0.2 against 
H;: p < 0.2, using a 5% level of significance. 


a Calculate the critical region for this test. 


b State the probability of a Type I error for this test and, given that the true probability was 
found to be p = 0.05, calculate the probability of a Type II error. 
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The random variable X is geometrically distributed, and it is desired to test Ho: p = 0.02 against 
H;: p < 0.02, using a 1% level of significance. 


a Calculate the critical region for this test. 


b State the probability of a Type I error for this test and, given that the true probability was 
found to be p = 0.01, calculate the probability of a Type IJ error. 


The random variable X is geometrically distributed, and it is desired to test Ho: p = 0.01 against 
H;: p 0.01, using a 5% level of significance. 


a Calculate the critical region for this test. 


b State the probability of a Type I error for this test and, given that the true probability was 
found to be p = 0.1, calculate the probability of a Type II error. 


a Define: 
i a Type I error (1 mark) 
ii a Type IJ error. (1 mark) 


The discrete random variable ¥ ~ Geo(p). You wish to test Ho: p = 0.004 against H,: p # 0.004, 
using a 10% significance level. The probability in each tail should be as close to 0.05 as possible. 
b Find the critical region for this test. (7 marks) 
c State the probability of a Type I error occurring for this test. (1 mark) 


Michael has bought a dice with 20 sides, and his friend David suspects that it is landing on 17 
more often than it is landing on the other values. They both decide to test this in two different 
ways, using a 5% significance level. Michael throws the dice 40 times and records the number of 
times the dice lands on the 17. 


a Find the critical region for Michael’s test. (4 marks) 
b State the probability of a Type I error occurring for Michael’s test. (1 mark) 
David decides to throw the dice until the first time it lands on 17. 

c Find the critical region for David’s test. (4 marks) 
d State the probability of a Type I error occurring. (1 mark) 
The actual probability of the dice landing on 17 is 0.0588. 

e Calculate the probability of a Type II error occurring in David’s test. (2 marks) 
f Calculate the probability of a Type II error occurring in Michael’s test. (2 marks) 


[8.2 | Finding Type | and Type Il errors using the normal distribution 


You need to be able to find Type | and Type II errors using the normal distribution. 


If you are carrying out a hypothesis test for 
the mean of a normal distribution, you will be given 


the value for the population standard deviation, o or 
variance, o*. The sample variance for a sample of 


2 
size n will be a 
€ Statistics and Mechanics Year 2, Section 3.7 
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In the examples in the previous section P(Type | error), which gives the actual significance level, was 
not equal to the target significance level. This was due to the discrete nature of the distributions used. 


= When a continuous distribution such as the normal distribution is used then P(Type | error) 
is equal to the significance level of the test. 


Example 


Bags of sugar having a nominal weight of | kg are filled by a machine. From past experience it is 

known that the weight, Ykg, of sugar in the bags is normally distributed with a standard deviation 

of 0.04kg. At the beginning of each week a random sample of 10 bags is taken in order to see if 

the machine needs to be reset. A test is then done at the 5% significance level with 

Ho: w = 1.00 kg and H;: » # 1.00 kg. Find: 

a the critical region for this test {Online } Explore @ 
: probabilities of Type | 

b P(Type I error) for this test. and Type Il errors in a normal 

Assuming that the mean weight has in fact changed to 1.02 kg, distribution using GeoGebra. 


c find P(Type I error) for this test. 


P(0.9752 < X < 1.0248) =0.6476 3 


Use the normal cumulative distribution function 


a The distribution of ¥ is modelled by Since this is a two-tailed test you allow 2.5% at 
2 each tail. 
(10, 224°) 
10 
From the tables the critical region for Z is The critical region is found by rearranging 
eee *—"! > 1.96 for w= 1.0, 0 = 0.04 and n = 10. 
= vn 
The critical values for X are given by 
= 0.042 Notice once again that the critical region is in two 
x= S196-xX . 
10 parts. 
= 0.9752 and 1.0248 
The critical region is X < 0.9752 and 
XY = 10246. Type | error Type | error 
b P(Type | error) for this test will be the 0.9752 10 1.0248 e 
same as the significance level = 0.05. 
c The area required for a Type ll error lies — fecieaer 
between X = 0.9752 and X = 1.0246 
= 2 
given that X is modelled by (102, a4 ) 
The probability of a Type Il error is 0.9752 1.02 | 2 
given by 1.0248 


: 0.042 
on your calculator, with o = Tig = 0.01249 ... 
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When carrying out hypothesis tests, you want to keep P(Type | error) and P(Type Il error) as low as 
possible. The following example illustrates the relationship between Type | and Type II errors. 


The weight of jam in a jar, measured in grams, is distributed normally with a mean of 150 g and 
a standard deviation of 6g. The production process occasionally leads to a change in the mean 
weight of jam per jar but the standard deviation remains unaltered. 


The manager monitors the production process and for every new batch takes a random sample of 25 
jars and weighs their contents to see if there has been any reduction in the mean weight of jam per jar. 


Find the critical values for the test statistic Y, the mean weight of jam in a sample of 25 jars, using: 
a a5% level of significance 

b a 1% level of significance. 

Given that the true value of for the new batch is in fact 147, 

c find the probability of a Type II error for each of the above critical regions. 


a Ho: w = 150 
Hy:  < 150 (i.e. a one-tailed test) 
= =| 7 _ 
Xw~ N(150, aS yp n= 25 anda=6 
The 5% critical region for Z is 
Z < -1.6449 50 reject Ho if —— 
BIS 2 A GhiS 
v25 


That is, the critical region for X is 


r= = x (-1.6449) + 150 


50. 6X S 148.0262. 


b The 1% critical region for Z is 
Z < -2.3263 so reject Ho if 


XY =150 
6 
V¥25 


S -2.3263 


That is, the critical region for X is 


hes = * (<2,3263) «150 


50.) 6X S 147.20844. 


c 5% test P(Type Il error) 
= P(X > 148.026... | = 147) 
= 0.1963 (4 d.p) 


1% test P(Type Il error) 
= P(X > 147.2084 |p = 147) 
= 04311 (4 dp) 
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Notice how in this example if we try to reduce P(Type | error) from 5% to 1% then P(Type II error) 
increases from 0.1963 to 0.4311. A more detailed study of the interplay between these two 
probabilities follows later in this chapter. However, you should be aware of this phenomenon and 
appreciate one of the reasons why we do not always use a significance level that is very small. 

The value of 5% is a commonly used level and, in a situation where a particular significance level is 
not given, this value is recommended. 


This does not mean that other significance levels are never used. When, for example, the results of 
the research are highly important and making a Type | error could be very serious, a 1% significance 
level might be used. In other cases a significance level of 10% might be used. An alternative method 
of reducing the probability of a Type Il error is to increase the sample size but this can increase the 
cost or duration of a survey or experiment. 


The relationship between the probabilities of Type | and Type Il errors can be illustrated by imagining 
pushing down on one side of a balloon. 


P(Type | error) | J Pctype ll error) 


| P(Type Il error) P(Type | error) | 


The only way to push down on both sides at once (and reduce the overall thickness) is to allow the 
air to move sideways. Using a larger balloon would allow you to reduce the overall thickness (this is 
equivalent to increasing the size of the sample 7). 


Exercise 


1 The random variable ¥ ~ N(u, 37). A random sample of 20 observations of X is taken, and the 
sample mean X is taken to be the test statistic. It is desired to test Hp: w = 50 against 
H,: w > 50, using a 1% level of significance. 


a Find the critical region for this test. 
b State the probability of a Type I error for this test. 
Given that the true mean was later found to be 53, 


c find the probability of a Type II error. 


2 The random variable Y¥ ~ N(y, 27). A random sample of 16 observations of X is taken, and the 
sample mean X is taken to be the test statistic. It is desired to test Hp: w = 30 against 
H;: uv < 30, using a 5% level of significance. 


a Find the critical region for this test. 
b State the probability of a Type I error for this test. 


Given that the true mean was later found to be 28.5, 
c find the probability of a Type II error. 
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M3 The random variable Y ~ N(u, 4’). A random sample of 25 observations of YX is taken, and the 
sample mean X is taken to be the test statistic. It is desired to test Hp: = 40 against 
H,: « #40, using a 1% level of significance. 


a Find the critical region for this test. 

b State the probability of a Type I error. 

Given that the true mean was later found to be 42, 
c find the probability of a Type II error. 


(E) 4 A manufacturer claims that the average outside diameter of a particular washer produced by his 
factory is 15mm. The diameter is assumed to be normally distributed with a standard deviation 
of 1mm. The manufacturer decides to take a random sample of 25 washers from each day’s 
production in order to monitor any changes in the mean diameter. 

a Using a significance level of 5%, find the critical region to be used for this test. (4 marks) 
Given that the average diameter had in fact increased to 15.6mm, 


b find the probability that the day’s production would be wrongly accepted. (2 marks) 


(E/P) 5 The number of patients that a medic can inoculate with a vaccine in one day can be modelled by 
a normal distribution with mean 40 and standard deviation 8. The manufacturer of the vaccine 
claims that a new method of inoculation will speed up the rate at which the medic works. 


A random sample of 30 medics tried out the new method of inoculation and the average number 
of patients they dealt with per day XY was recorded. 


a Using a 5% significance level, find the critical value of X. (4 marks) 
The average number of patients dealt with per day using the new method of inoculation was in 
fact 42. 

b Find the probability of making a Type II error. (2 marks) 


The manufacturer of the vaccine would like to lessen the probability of a Type II error being 
made and recommends that the significance level be changed. 


c State, giving a reason, what recommendation you would make. (1 mark) 


& Calculate the size and power of a test 


You need to be able to calculate the size and power of a test. 


You have already seen that a Type | error occurs when the null hypothesis is rejected when it is in fact 
true. The probability of a Type | error will be written as a and is often known as the size of the test. 


= The size of a test is the probability of rejecting the null hypothesis when it is in fact true and 
this is equal to the probability of a Type | error. 


The size of a test, as you have seen, is the actual significance level of the test and this is usually 
chosen before the test is carried out. 


When conducting a hypothesis test you should also be interested in the probability of rejecting the 
null hypothesis when it is in fact untrue, as this is clearly a desirable feature of a test. The probability 
of rejecting the null hypothesis H, when it is untrue, is known as the power of the test. 
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= The power of a test is the probability of rejecting the null hypothesis when it is not true. 


Power = 1 — P(Type Il error) 
= P(being in the critical region when H, is false) 


The greater the power of a test, the greater the probability of rejecting Hp when H, is false. It follows 
that the higher the power, the better the test. 


The table on page 148 can now be rewritten to show the probabilities for the different situations. 


Truth 
H, is true H, is false 
Accept H OK P(Type Il error 
Conclusion of test : pe : (Typ 
Reject Hy | Size = P(Type | error) | Power = 1 — (Type Il error) 


The size and power both relate to rejecting Hp. 
The size relates to when H, is true and a Type | error has been made. 
The power relates to when H, is false and a correct decision was made. 


If the power is greater than 0.5, the probability of coming to the correct conclusion (rejecting Hy when 
H, is false) is greater than the probability of coming to the wrong conclusion (accepting Hy when Hg is 
false). 


On page 156 you were told that, generally, if you increase the sample size the probability of a Type II 
error decreases. It follows that the larger the sample size, the greater the power of the test. Increasing 
the sample size is preferable to increasing the significance level as a way of increasing the power of a 
test. 


Example 


The random variable X has a binomial distribution. A random sample of size 25 was taken to test 
Ho: p = 0.30 against H,: p < 0.30 using a 10% level of significance. 


a Find the critical region for this test. 
b Find the size of this test. 

Given that p = 0.20, 

c calculate the power of this test. 


a X~ B(25, p) 
Hot p= 0.30 hep O30 
Assume Ho so that X ~ (25, 0.30). 
Ho is rejected when XY S c where 
P(X Sc) S O10. 
From tables: 


P(X < 4) = 0.0905 - _ Use tables of B(25, 0:30) 


P(X = 5) = 0.1935 . 


So the critical region is X S 4. 
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b Size = P(Type | error) 
=P(X <4|p=030) - 
= 0.0905 
c lf p=0.20 then Ho is false. 
Power = P(rejecting Ho|Ho is false, i.e. p = 0.2) Calculate the power directly. 
= P(X < 4|p = 0.20) There is no need to calculate P(Type Il error) 
- 0.4207 first. Remember to change the p-value in your 
calculator from 0.30 to 0.20. 


Jam is sold in jars. The amount of jam, in grams, in a jar is normally distributed with mean w and 
standard deviation 5. The manufacturer claims that yw is 106 and quality control officers will take 
action against the manufacturer if ~ < 106. A random sample of 30 jars is examined and a 5% level 
of significance is used. 


a Find the critical region for the sample mean using this test. 
Given that in fact w = 102, 
b find the power of this test. 


a Ho: w= 106 Hy: w< 106 = 


cer 4 2 
i= cre) ~ n{106, 2 


Reject Ho when ¥ <c 
Critical region for z is ZS -1.6449 


ee eae < -1.6449 
30 
Le. X < 104498... 


b Power = P(¥ < 104.498...| = 102) 
= 0.9968 (4 dp) 


Example 


! 


A particular mobile-phone provider fails to deliver text messages with probability p. 
Brooke wants to investigate whether p > 0.02. 


Using Hp: p = 0.02 and H,: p > 0.02, Brooke notes the number of text messages she is able to send 
successfully up until the first failure. If this value is less than or equal to 5 she rejects Hp. If it is 
more than 100 she accepts Ho. If it is more than 5 but less than or equal to 100 she notes the 
number of additional text messages she is able to send successfully up until the next failure. 

She rejects Hy if this is less than or equal to 5 and accepts it otherwise. 

a Find the size of this test. 


b Calculate the power of this test when p = 0.015. 
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a Let X = number of messages sent up to 
and including first failure 
hen X ~ Geo(p) 
ssume Ho is true, so that X ~ Geo(0.02). 
X <=5)=1-(1-0.02)° 
= 0.09607... 
P(5 < X < 100) 
= P(X S 100) - P(X < 5) 
= 1 - (1 — 0.02)'0° — 0.09607... 
SO FIDO isn 
P(Ho rejected| p ='O102) 
= P(X = 5) +P(5 < X¥ S 100) x P(X = 5) 
= 0.09607... + 0.77130... x 0.09607... 
= 017018. 4 
The size of the test is 0.1702 (4 d.p.). 


Oe: A 


b Assume p = 0.015 so that Y ~ Geo(0.015). 
P(Y = 5) =1- (1 - 0.015)° = 0.07278... 
P(5 < Y = 100) 
= P(Y < 100) - P(Y = 5) 
=1'S (1 = O01! = 0107276). 
= 0.70660... 

P(Ho rejected| p = 0.015) 

=P(Y <5)+P(5 < Y<100) x P('Y <5) 
= 0.12421... 

The power of the test when p = 0.015 is 
O1242 (4 dp). + 


Problem-solving 


You can use Geo(0.02) to model the number 

of text messages up to and including the first 
failure. After the first failure, the number of text 
messages up to and including the next failure 
also has distribution Geo(0.02). 


Exercise (8c) 


1 The random variable Y ~ N(y, 37). A random sample of 25 observations of X is taken and the 
sample mean X is taken as the test statistic. It is desired to test Ho: w = 20 against H,: » > 20 


using a 5% significance level. 
a Find the critical region for this test. 


b Given that « = 20.8, find the power of this test. 


The random variable X has a binomial distribution. A sample of 20 is taken from it. It is 


desired to test Hp: p = 0.35 against H,: p > 0.35 using a 5% significance level. 


a Calculate the size of this test. 


b Given that p = 0.36, calculate the power of this test. 


The random variable X has a Poisson distribution. A sample is taken and it is desired to test 


Hp: 4 = 4.5 against H,: 4 < 4.5 using a 5% significance level. 


a Find the size of this test. 


b Given that 4 = 4.1, find the power of this test. 
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(P) 8 


A manufacturer claims that a particular rivet produced in his factory has a diameter of 2mm, 
and that the diameter is normally distributed with a variance of 0.004 mm7?. 

A random sample of 25 rivets is taken from a day’s production to test whether the mean 
diameter had altered, up or down, from the stated figure. A 5% significance level is to be used 
for this test. 


If the mean diameter had in fact altered to 2.02 mm, calculate the power of this test. (5 marks) 


In a binomial experiment consisting of 10 trials the random variable X represents the number 
of successes, and p is the probability of a success. 


In a test of Hp: p = 0.3 against H,: p > 0.3, a critical region of x = 7 is used. 


Find the power of this test when 


a p=0.4 (3 marks) 
b p=0.8. (3 marks) 
c Comment on your results. (1 mark) 


Explain briefly what you understand by 
a a Typel error (1 mark) 
b the size of a significance test. (1 mark) 


A single observation is made on a random variable X, where ¥ ~ N(w, 10). 
The observation, x, is to be used to test Ho: w = 20 against H,: w > 20. The critical region is 
chosen to be x = 25. 


c Find the size of the test. (2 marks) 


The random variable X has a geometric distribution. It is desired to test Hp: p = 0.01 against 
H,: p > 0.01 using a 5% significance level. 


a Find the critical region for this test. 
b Given that p = 0.2, calculate the power of this test. 


The random variable X has a geometric distribution. It is desired to test Hy: p = 0.01 against 
H;: p 0.01 using a 5% significance level. 


a Find the critical region for this test. 
b Given that p = 0.02, calculate the power of this test. 


The wallpaper produced by a certain manufacturer has defects that occur randomly at a 
constant rate of 4 per roll. If A is thought to be greater than 0.8 then action has to be taken. 


Using Hp: 4 = 0.8 and H;: 1 > 0.8, a quality control manager takes a sample of 10 rolls and 
rejects Hy if there are 12 or more defects. If there are 9 or fewer defects then Hg is accepted. 

If there are 10 or 11 defects, a second sample of 10 rolls is taken and Hp is rejected if there are 

8 or more defects in this second sample, otherwise it is accepted. 

a Find the size of this test. (4 marks) 


b Find the power of this test when 4 = 1. (3 marks) 
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'¥10 A sweet manufacturer makes boxes of jelly beans. The number of jelly beans in each box is 


E/P) 


assumed to be normally distributed with standard deviation 5. 


A consumer group wants to test the manufacturer’s claim that the mean number of jelly beans 
in each box is 80. The group takes repeated samples of 20 boxes and records the mean number 
of jelly beans per box in each sample. 


The random variable X represents the number of samples the group need to take before they 
obtain a sample with a mean less than 79. 

If X < 10 the group rejects the company’s claim. 

a Find the size of this test. (5 marks) 


b Given that the actual mean number of jelly beans in each box is 81, find the power 
of this test. (5 marks) 


Challenge 


A jam factory has an automated system for sealing their jars, and the 
expected probability of error when the machines are well calibrated is 
8%. The jars are sealed and placed into boxes of 60. To see whether the 
machine that sealed all the jars in a specific box needs recalibrating, 

a series of tests is performed. The first box is inspected by taking a 
sample of 20 jars and performing a test, with 5% significance level, 

to see whether the probability of a defective seal is greater than 8%. 

If the first box fails the test they conclude that the machine needs 
recalibrating, but if it passes the test they move on to the second box, 
and perform the same test. Once again, if the box fails the test they 
conclude that the machine needs recalibrating, but if it passes the test 
they move on to the next box, and so on until a box fails the test. 


a What is the maximum number of boxes that can be inspected such 
that the probability of a Type | error is smaller than 10%? 


The factory decides to conclude that the machine does not need 
recalibrating if the first four boxes all pass the test. 


b Given that after the second box the machine is decalibrated, increasing 
the probability of a defective seal to 20%, find the power of the test, 
knowing that only 4 boxes were inspected. You may assume that the 
probability of a defective seal for each jar in the first two boxes is 8%. 


8.4 ) The power function 


So far you have calculated the probability of a Type II error or the power only when you have been 
given a particular value of the population parameter of interest. Population parameters are seldom 
known, and if they were known there would be little point in doing the test anyway. Sometimes past 
experience can give you some idea of likely values of the parameters but, in general, since you do not 
know the value of the parameter, you cannot decide the power of the test concerned. It is, however, 
possible in these cases to calculate the power as a function of the relevant parameter (which we shall 
generalise as 6). Such a function is known as a power function. 
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= The power function of a test is the function of the parameter 6 which gives the probability that 
the test statistic will fall in the critical region of the test if 0 is the true value of the parameter. 


A power function enables you to calculate the power of the test for any given value of 8, and thus to 
plot a graph of power against 0. 


Past experience has shown that the number of accidents that take place at a road junction has a 
Poisson distribution with an average of 3.5 accidents per month. A trading estate is built along 
one of the roads leading away from the junction and the local council is anxious that this may have 
increased the accident rate. To see if the number of accidents had increased, a test was set up with 
the null hypothesis Hy: 4 = 3.5 and with the alternative hypothesis being accepted if the number of 
accidents X within the first month after the alteration was = 7. 

a Find the size of the test. 

b Find the power function for the test and sketch the graph of the power function. 


a Size of test = P(reject Ho when it is true) 
= P(X = 7|X ~ Po(3.5)) 
= 1- 0,9347 = 0.00653 
b Power function = P(reject Ho when it is false) 


=1- P(X <6|X~ Po(d)) Problem-solving 


je gs ga 4s ye You do not know the value of 4. Your 
=1- “(i +A+——+4+ 
power function will be given in terms 


ge eh 30” 720 
This enables values of the power of the test to be of this unknown parameter. 
calculated for different values of A. 
A=A4_ gives power = 0.1107 
ives power = 0.2378 
ives power = 0.3937 
ives power = 0.5503 
ives power = 0.6566 
ives power = 0.7932 
A= 10 gives power = 0.8699 
The graph is as shown below. 
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Power functions are particularly useful when comparing two different tests. 


= When comparing two tests of comparable size, you should recommend the test with the 
higher power within the likely range of the parameter. 


Example 


! 


A manufacturer of sweets supplies a mixed assortment of chocolates in a jar. He claims that 40% 
of the chocolates have a ‘hard centre’, the remainder being ‘soft centred’. 


A shopkeeper does not believe the manufacturer’s claim and proposes to test it using the following 
hypotheses. 

Ho: p = 0.4 H,:p<04 
where p is the proportion of ‘hard centres’ in the jar. Two tests are proposed. 
In test A he takes a random sample of 10 chocolates from the jar and rejects Hy if the number of 
‘hard centres’ is less than 2. 
a Find the size of test A. 
b Show that the power function of test A is given by 

(l= p)+ 10p. —p)?. 

In test B he takes a random sample of 5 chocolates from the jar and if there are no ‘hard centres’ 


he rejects Ho, otherwise he takes a second sample of 5 chocolates and Ho is rejected if there are no 
further ‘hard centres’ on this second occasion. 


c Find the size of test B. 
d Find an expression for the power function of test B. 
The powers for test A and test B for various values of p are given in the table. 


Pp 0.1 0.2 0.25 0.3 0.35 
Power for test A 0.74 r 0.24 RY 0.09 
Power for test B 0.83 0.54 0.42 0.31 0.22 


e Calculate values for r and s. 
f State, giving a reason, which of the two tests the shopkeeper should use. 


a Size of test A= P(X < 2|X ~ BIO, OA) 
= 0.0464 (4 dp.) 


b Power of test A = P(X < 2|X ~ BUIO, p)) Ge as Pee faa intiiedeciiod 


=e =o) r=) form, so you don't need to factorise, 
See er but you could also write this power 


function as (1 — p)9(1 + 9p). 
c Size of test B= P(reject Ho|p = OA) cnatlol aE) Ur 


= P(X = QO) + (1 — P(X = O)) x P(X = O) 
= 0.6° + (1 - 0.6)° x 0.6° 
= 0.0786 
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= d Power of test B=P(O hard centres in first 5) 
+ P(O hard centres in second 5 and > O hard centres Consider the conditions that are 
in first 5) necessary for Hy to be rejected: 
= P(X = O|p) + (1 — P(X = O|p)) x P(X = O|p) First sample Second sample 
=(1- p)? + (1 - (1 - pi?) — p)® X=0Oand Hy rejected 
(apr Ucat=(i=pF 
= (1 — p)? (2 - (1 - p)’) X =0and Hy rejected 
= 2(1 -— p) - (1 — py? 10 
X>0 
e Test A: p=0.2 Power = (1 - 0.2)'° + 10(0.2)(1 - 0.2)? 
= 0136 
60 Tr=0:56 
p=0O3 Power = (1 - 0.3)'° + 10(0.3)(1 - 0.3)? 
so S= O15 pees ae 
omment 
f Power for test B > Power for test A for all the given « 
values of p, so he should use test B. 


Example 


A local park believes the fox population in the area has decreased. They want to test for the 
probability, p, that a fox will be observed on any given day. They count the number of days, X, 
that pass until the first observation of a fox. They test Hy: p = 0.1 against H,: p < 0.1 and reject Hy 
if X > 30. 

a Find the size of this test. 


b Find the power function for the test. 


a Size of test = P(X > 30|X ~ Geo(0.1)) 
= (1 — 01)9° - 
= 0.9°° = 0.0424 (4 dp) 

b Power function = P(X > 30|X ~ Geo(p)) 


= (1 - p)°° 


(E/P) 1 A single observation x is taken from a Poisson distribution with parameter 4. This observation is 
to be used to test Hy: J = 6.5 against H,: 4 < 6.5. The critical region chosen was x < 2. 


a Find the size of the test. (4 marks) 
b Show that the power function of this test is given by 
e“(1 +At+ 322) (3 marks) 
The table gives the value of the power function to two decimal places. 
A i 2 3 4 5 6 
Power 0.92 Ss 0.42 0.24 t 0.06 
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c Calculate values for s and t. (1 mark) 

d Draw a graph of the power function. (1 mark) 

e Find the values of 4 for which the test is more likely than not to come to the correct 
conclusion. (1 mark) 


In a binomial experiment consisting of 12 trials, ¥ represents the number of successes and p the 
probability of a success. 


Ina test of Hp: p = 0.45 against H,: p < 0.45 the null hypothesis is rejected if the number of 
successes is 2 or less. 


a Find the size of this test. (4 marks) 
b Show that the power function for this test is given by 

(1 - p)'? + 12p(1 - p)!! + 66p7(1 — p)!° (3 marks) 
c Find the power of this test when p is 0.3. (1 mark) 


In a binomial experiment consisting of 10 trials, the random variable X represents the number of 
successes and p the probability of a success. 


Ina test of Ho: p = 0.4 against H,: p > 0.4, a critical region of x => 8 was used. 
Find the power of this test when: 
a p=0.5 
b p=0.8. 
c Comment on your results. 
A certain gambler always calls heads when a coin is spun. Before he uses a coin he tests it to see 
whether or not it is fair and uses the following hypotheses: 
1 I 
Ho: p => Hy: p<3z 


where p is the probability that the coin lands heads on a particular spin. Two tests are proposed. 


In test A the coin is spun 10 times and Hp is rejected if the number of heads is 2 or fewer. 


a Find the size of test A. (4 marks) 
b Explain why the power of test A is given by 
(1 - p)!© + 10p(1 — p)? + 45p71 — p)8 (3 marks) 


In test B the coin is first spun 5 times. If no heads result, Hp is immediately rejected. Otherwise 
the coin is spun a further 5 times and Hp is rejected if no heads appear on this second occasion. 


c Find the size of test B. 


d Find an expression for the power of test Bin terms of p. 


(4 marks) 
(3 marks) 


The power for test A and the power for test B are given in the table for various values of p. 


Pp 0.1 0.2 0.25 0.3 0.35 0.4 

Power for test A 0.9298 0.6778 0.3828 0.1673 

Power for test B 0.8323 0.5480 0.4183 0.3079 0.2186 0.1495 
e Find the power for test A when p is 0.25 and 0.35. (2 marks) 
f Giving a reason, advise the gambler about which test he should use. (1 mark) 
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A 


(E/P) 6 


In an experiment the probability of success in each trial is constant, and the random variable X 
represents the number of trials needed to get one success. A test of Hp: p = 0.15 against 

H;: p < 0.15 with a 1% significance level is used. 

a Find the size of the test. 


b Find the power function. 


A cyclist uses new tyres every time he does a time trial. He has found that on one specific route 
he has a probability of 0.9 of not getting a flat tyre. After changing tyre brands he believes that 
the new tyres are more resistant, and decides to perform a test, with 5% significance level, by 
doing 10 trials on the route and seeing how many times he would complete it without a flat tyre. 
a Find the size of the test. (4 marks) 
b Find the power function of the test. (2 marks) 
c Find the power function for the test if, instead of 10 trials, he had done 12 trials. (5 marks) 
d Given that the probability of completing the trial without a flat tyre with the new brand is 
0.95, calculate which number of trials gives a more accurate test result. (3 marks) 


Mixed exercise 8) 


(E) 1 


The random variable X is binomially distributed. A sample of 15 observations is taken and it is 
desired to test Hy: p = 0.35 against H,: p > 0.35 using a 5% significance level. 


a Find the critical region for this test. (4 marks) 
b State the probability of making a Type I error for this test. (2 marks) 
The true value of p was found later to be 0.5. 

c Calculate the power of this test. (2 marks) 


The random variable X has a Poisson distribution. A sample is taken and it is desired to test 
H,: 4 = 3.5 against H,: 4 < 3.5 using a 5% significance level. 


a Find the critical region for this test. (4 marks) 
b State the probability of committing a Type I error for this test. (2 marks) 
Given that the true value of / is 3.0, 

c find the power of this test. (2 marks) 


The random variable ¥ ~ N(u, 9). A random sample of 18 observations is taken, and it is desired 
to test Hy: w = 8 against H;: wv # 8, at the 5% significance level. The test statistic to be used is 


Z= as : 
vn 
a Find the critical region for this test. (4 marks) 
b State the probability of a Type I error for this test. (2 marks) 
Given that yw was later found to be 7, 
c find the probability of making a Type II error. (2 marks) 
d State the power of this test. (1 mark) 
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m4 A bird observatory wishes to test whether the migration rate of geese has changed from that of 


E/P) 


3 


6 


ad 
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10 per day. First they take note of how many geese are observed flying in a migratory pattern 
on a specific day. If the number of geese migrating is greater than or equal to 4 and less than or 
equal to 17, then they conclude that the rate has not changed. If they observe 3 or fewer geese, 
then on the following day they conduct further observations, and if they observe 2 or fewer 
geese they conclude that the rate has decreased, otherwise they conclude that it hasn’t changed. 
If on the first day they observe 18 or more geese migrating, then on the following day they also 
conduct further observations and if they observe 19 or more geese migrating they conclude that 
the rate has increased, otherwise they conclude that it has not changed. 


a Find the size of the test. (4 marks) 
Given that the migration rate of the geese actually dropped to 5 per day, 
b find the power of the test. (6 marks) 


A single observation, x, is taken from a Poisson distribution with parameter 1. The observation 
is used to test Hy: A = 4.5 against H,: A > 4.5. The critical region chosen for this test was x = 8. 


a Find the size of this test. (4 marks) 
b The table gives the power of the test for different values of 4. 


A 1 2 3 4 5 6 7 8 9 10 

Power O | 0.0011 | 0.0119 | r | 0.1334] s | 0.4013 | 0.5470 | ¢ | 0.7798 
i Find values for r, s and ¢. (2 marks) 
ii Using graph paper, plot the power function against 2. (2 marks) 


In a binomial experiment consisting of 15 trials, X represents the number of successes and p the 
probability of success. 


In a test of Ho: p = 0.45 against H,: p < 0.45 the critical region for the test was X < 3. 


a Find the size of the test. (4 marks) 
b Use the binomial cumulative distribution function to complete the table given below. 
(3 marks) 
Pp 0.1 0.2 0.3 0.4 0.5 


Power | 0.944 Ss 0.2969 t 0.0176 


c Draw the graph of the power function for this test. (1 mark) 


A company buys rope from Bindings Ltd and it is known that the number of faults per 100m 
of their rope follows a Poisson distribution with mean 2. The company is offered 100m of rope 
by Tieup, a newly established rope manufacturer. The company is concerned that the rope from 
Tieup might be of poor quality. 


a Write down the null and alternative hypotheses appropriate for testing that rope from 


Tieup is in fact as reliable as that from Bindings Ltd. (1 mark) 
b Derive a critical region to test your null hypothesis with a size of approximately 0.05. 
(4 marks) 
c Calculate the power of this test if rope from Tieup contains an average of 4 faults 
per 100m. (3 marks) 
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mm 8 The number of faulty garments produced per day by machinists in a clothing factory has 
E/P) a Poisson distribution with mean 2. A new machinist is trained and the number of faulty 
garments made in one day by the new machinist is counted. 


a Write down the appropriate null and alternative hypotheses involved in testing the theory 
that the new machinist is less reliable than the other machinists. (1 mark) 


b Derive a critical region, of size approximately 0.05, to test the null hypothesis. (4 marks) 


c Calculate the power of this test if the new machinist produces an average of 3 faulty 
garments per day. (3 marks) 


The number of faulty garments produced by the new machinist over three randomly selected 
days is counted. 


d Derive a critical region, of approximately the same size as in part b, to test the null 


hypothesis. (2 marks) 
e Calculate the power of this test if the machinist produces an average of 3 faulty garments 

per day. (3 marks) 
f Comment briefly on the difference between the two tests. (1 mark) 


(E/P) 9 A single observation, x, is to be taken from a Poisson distribution with parameter p. 
This observation is to be used to test Hp: w = 6 against H,: uw < 6. The critical region is chosen 
to be x <2. 


a Find the size of the critical region. (1 mark) 
b Show that the power function for this test is given by 


se! (2+ 2u + p2) (4 marks) 
The table gives the values of the power function to 2 decimal places. 
pb 1.0 ili} 2.0 4.0 5.0 6.0 7.0 
Power 0.92 0.81 RY 0.24 t 0.06 0.03 
c Calculate the values of s and ¢. (3 marks) 
d Draw a graph of the power function. (2 marks) 


e Estimate the range of values of « for which the power of this test is greater than 0.8. (3 marks) 


(E/P) 10 A proportion p of the items produced by a laboratory are defective. A technician selects a 
random sample of 10 items from each batch produced to check whether or not there is evidence 
that p > 0.10. The criterion that the technician uses for rejecting the hypothesis that p is 0.10 is 
that there are more than 4 defective items in the sample. 

a Find the size of the test. (2 marks) 


The table gives some values, to 2 decimal places, of the power function of this test. 


Pp 0.15 0.20 0.25 0.30 0.35 0.40 
Power 0.01 0.03 u 0.15 0.25 0.37 
b Find the value of wu. (2 marks) 


A supervisor checks the production by taking a random sample of 5 items from each batch 
produced. The hypothesis that p = 0.10 is rejected if more than 2 defective items are found in 
the sample. 
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c Find P(Type | error) using the supervisor’s test. (2 marks) 


The table gives some values, to 2 decimal places, of the power function for the test in part c. 


Pp 0.15 0.20 0.25 0.30 0.35 0.40 
Power 0.03 0.06 0.10 0.16 v 0.32 
d Find the value of v. (2 marks) 
e Using the same axes, on graph paper draw the graphs of the power functions of these 

two tests. (4 marks) 
f i State the value of p where the graphs cross. 

ii Explain the significance of p being greater than this value. (2 marks) 
g Suggest two advantages of using the test with sample size 5. (2 marks) 


Accidents on a stretch of motorway occur at an average rate of 2 per week. A road safety 
officer takes a random sample of 10 weeks to test whether or not there is evidence that 4 > 0.3. 
The criterion that the officer uses for rejecting the hypothesis that 2 = 0.3 is that there are more 
than 5 accidents in the sample. 

a Find the size of the test. (2 marks) 


The table gives some values, to 2 decimal places, of the power function of this test. 


A 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
Power 0.21 a 0.55 0.70 0.81 0.88 0.93 
b Find the value of a. (2 marks) 


The road safety manager would like to design a test of whether or not 4 > 0.3, using a larger 
sample. The manager chooses a random sample of 15 weeks and requires the probability of a 
Type I error to be less than 5%. 


c Find the criterion to reject the hypothesis that 4 = 0.3 which makes the test as powerful 
as possible. (2 marks) 


d Hence state the size of this second test. (1 mark) 


The table gives some values, to 2 decimal places, of the power function for the test in part c. 


A 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
Power 0.15 0.34 0.54 0.72 0.85 b 0.96 
e Find the value of b. (2 marks) 
f Using the same axes, on graph paper draw the graphs of the power functions of these 
two tests. (4 marks) 


g i State the value of 4 where the graphs cross. 
ii Explain the significance of A being greater than this value. (2 marks) 


Challenge 


Jane and Emma decide to test a pair of dice from a new board game. 
They suspect that at least one of them has probability higher than 73 of 
showing the value one. 


Jane decides to throw both dice 12 times. If a pair of ones appears 2 or 
more times, she concludes that at least one of the dice is biased. 


a Find the size of Jane’s test. 


b Express the power of Jane’s test in terms of the parameter p, which 
represents the probability of obtaining a pair of ones. 


Emma decides to throw one dice 6 times. If the value one appears 4 or 
more times she concludes that the dice is biased. If it appears fewer 
than 4 times, then she throws the other dice 6 times, and concludes that 
the second dice is biased if the value one appears 4 or more times. 


c Find the size of Emma’s test. 


Now assume that one of the dice is fair, and let g be the probability of 
obtaining the value one on the other dice. 


d Show that the power of Jane’s test is given by the expression 


a ( a 
eles) Seale 
( | EM ae 


e Show that the power of Emma’s test is given by the expression 


0.0087 + 14.8695q* — 23.7912q° + 9.913¢° 


0.059 0.05 0.1 0.15 0.2 0.25 0.3 0.35 04 0.45 7 

f By using a table of values, draw the graph of the power function for 
Jane’s test. 

g Given that the parameter g lies between 0.1 and 0.4, explain, giving 
your reasons, which test you would recommend. 
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A Type I error is when you reject Ho, but H, is in fact true. The probability of a Type | error is 
the same as the actual significance level of the hypothesis test. 


A Type Il error is when you accept Ho, but Hy is in fact false. 


When a continuous distribution such as the normal distribution is used then P(Type | error) is 
equal to the significance level of the test. 


The size of a test is the probability of rejecting the null hypothesis when it is in fact true and 
this is equal to the probability of a Type | error. 


The power of a test is the probability of rejecting the null hypothesis when it is not true. 
Power = 1 — P(Type II error) = P(being in the critical region when H, is false) 
The power function of a test is the function of the parameter @ which gives the probability 


that the test statistic will fall in the critical region of the test if 9 is the true value of the 
parameter. 


When comparing two tests of comparable size, you should recommend the test with the higher 
power within the likely range of the parameter. 


Review exercise 


1 A quality control manager regularly 


samples 20 items from a production line 
and records the number of defective items 
x. The results of 100 such samples are 
given below. 


x 


O;1),2/3)4)]51]6 


Frequency | 17) 31/19/14) 9 | 7 | 3 0 


a Estimate the proportion of defective 
items from the production line. (1) 


The manager claims that the number of 
defective items in a sample of 20 can be 
modelled by a binomial distribution. He 
uses the answer in part a to calculate the 
expected frequencies given below. 


a Suggest a suitable distribution to 
model the number of heads when five 
unbiased coins are spun. (1) 


b Test, at the 10% level of significance, 
whether or not the five coins are 
unbiased. State your hypotheses 
clearly. (6) 

< Section 6.4 


3 Ten cuttings were taken from each of 100 


randomly selected garden plants. The 
number of cuttings that did not grow were 
recorded. 


The results are as follows. 


Number 3,9 
which did | 0 | 1 | 2|3]4]5 ]6]7 +1] or 
not grow 10 


; ; Noel lala llce 7 or 

more 
Expected | 15 >/27.0| r [19.0] s |3.2/0.9| 0.2 
frequency 


Frequency | 11 | 21 | 30}20/12} 3} 2]1 1] 0 


Ep) 2 


b Find the value of r and the value of s 
giving your answers to | decimal place. (2) 


c Stating your hypotheses clearly, use 
a 5% level of significance to test the 
manager’s claim. (6) 


d Explain what the analysis in part c tells 
the manager about the occurrence of 
defective items from this production 
line. (2) 

< Section 6.4 


Five coins were spun 100 times and the 
number of heads recorded. The results are 
shown in the table below. 


Number of heads | 0 1 2 3 4 5 


Frequency 6 | 18 | 29 | 34} 10} 3 


a Show that the probability of a randomly 
selected cutting, from this sample, not 
growing is 0.223. (2) 


A gardener believes that a binomial 
distribution might provide a good model 
for the number of cuttings, out of 10, that 
do not grow. 


He uses a binomial distribution, with the 
probability 0.2 of a cutting not growing. 
The calculated expected frequencies are as 
follows. 


Number me 

whichdid| 0 | 1 | 2 | 3 | 4 : 
more 

not grow 

Expected | - |o6:¢4| -@ |20.13| 8:81 |) x 

frequency 

b Find the values of r, s and ¢. (3) 


Review exercise 2 


c State clearly the hypotheses required 
to test whether or not this binomial 
distribution is a suitable model for these 
data. (1) 


The test statistic for the test is 4.17 and the 
number of degrees of freedom used is 4. 


d Explain fully why there are 4 degrees of 
freedom. (2) 


e Stating clearly the critical value used, 
carry out the test using a 5% level of 
significance. 


(4) 


<€ Section 6.4 


The number of times per day a computer 
fails and has to be restarted is recorded for 
200 days. The results are summarised in 
the table. 


Number of restarts Frequency 
0 99 
1 65 
2 22 
3 12 
4 2 


Test whether or not a Poisson model is 
suitable to represent the number of 
restarts per day. Use a 5% level of 
significance and state your hypothesis 
clearly. 


(6) 


<€ Section 6.4 


The Director of Studies at a large 

college believes that students’ grades in 
Mathematics are independent of their 
grades in English. She examined the results 
of arandom group of candidates who had 
studied both subjects and she recorded 

the number of candidates in each of the 6 
categories shown. 


Maths grade 
AorB | CorD | EorU 
English | A or B 25 25 10 
grade | CtoU 5 30 15 
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a Stating your hypotheses clearly, test the 
Director’s belief using a 10% level of 
significance. You must show each step 
of your working. 


The Head of English suggested that the 
Director was losing accuracy by combining 
the English grades C to U in one row. He 
suggested that the Director should split the 
English grades into two rows, grades C or 
D and grades E or U as for Mathematics. 


(7) 


b State why this might lead to problems in 
performing the test. (2) 
< Section 6.5 


People over the age of 65 are offered an 
annual flu injection. A health official took 
a random sample from a list of patients 
who were over 65. She recorded their 
gender and whether or not the offer of 

an annual flu injection was accepted or 
rejected. The results are summarised below. 


Accepted Rejected 
Male 170 110 
Gender 
Female 280 140 


Using a 5% significance level, test whether or 
not there is an association between gender 
and acceptance or rejection of an annual flu 
injection. State your hypotheses clearly. (7) 
< Section 6.5 


Students in a mixed sixth form college are 
classified as taking courses in either arts, 
science or humanities. A random sample 
of students from the college gave the 
following results. 


Course 
Arts | Science | Humanities 
Boy 30 50 35 
Gender 
Girl | 40 20 42 


Showing your working clearly, test, at the 1% 
level of significance, whether or not there is 
an association between gender and the type 
of course taken. State your hypotheses clearly. 


(7) 


€ Section 6.5 


8 A researcher carried out a survey of three 
treatments for a fruit tree disease. 


No Remove Spray 
acini diseased with 
branches | chemicals 

Tree died within 10 5 6 

1 year 

Tree survived 

for 1-4 years ? : z 
Tree survived 

beyond 4 years ; z 


Test, at the 5% level of significance, 
whether or not there is any association 
between the treatment of the trees and 
their survival. State your hypotheses and 
conclusion clearly. (7) 
< Section 6.5 


A research worker studying colour 
preference and the age of a random 
sample of 50 children obtained the results 
shown below. 


pc Red Blue Totals 
4 12 6 18 
8 10 ‘ 17 
12 6 9 15 

Totals 28 22 50 


©) 10 


Using a 5% significance level, carry out 

a test to decide whether or not there is 

an association between age and colour 

preference. State your hypotheses clearly. (7) 
< Section 6.5 


A celebrity receives fan mail six days 

a week. She thinks that the deliveries 

of mail are uniformly distributed 
throughout the week. The deliveries over 
a five-week period are as follows: 


Day 


Mon 


Tues 


Wed 


Thurs 


Fri 


Sat 


Frequency 


20 


15 


18 


23 


19 


25 


Test the celebrity’s assertion using a 1% 
level of significance. 


(6) 


< Section 6.4 


mii 


G 


(E/P) 12 


(E/P) 13 


Review exercise 2 


Philomena has a large collection of 
DVDs, but she estimates that she only 
likes 40% of them. Every evening she 
picks DVDs from the rack at random 
until she finds one she likes. Over the 
course of two months she records the 
number of DVDs that she has picked 
each evening before finding one she likes. 
Her data is shown in the table below. 


Number of DVDs 1 2 3 4 5 
30 | 18 | 12] 1 1 


a Calculate the expected frequencies 
if the number of DVDs chosen is 
modelled as a geometric random 
variable Y ~ Geo(0.4). 

Philomena wants to test her belief that 

the proportion of her DVDs that she likes 

is actually 40%. 

b Write down suitable null and 
alternative hypotheses. 


Frequency 


(3) 


(2) 
c Is Philomena right in her belief? Use a 
1% level of significance. (6) 
d State the effect on Philomena’s 
conclusion if she had used a 2% level 
of significance. (1) 
€ Section 6.6 


The probability generating function of a 
discrete random variable X is given by 


Gy(t) = k(3 + 2t+ PP 
a Find the value of k. 
b Find P(Y= 1). 


(2) 
(2) 


€ Section 7.1 


Billy is practicing archery. His probability 
of hitting a ‘gold’ in any one attempt is 
0.24. In his first practice session, he fires 
arrows at a target until he hits a ‘gold’. 


a Suggest a suitable model for the 
random variable X, the number of 
arrows it takes him to hit a ‘gold’. (1) 

b Find P(X = 7). (1) 

c Write down the probability generating 
function for X. (2) 
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A In his second practice session, he fires 15 
arrows at the target. 


d Write down the probability generating 
function for the random variable Y, the 
number of times he hits a ‘gold’. —_(2) 


In his third practice session, he continues 
to fire arrows until he has hit a ‘gold’ four 
times. 


e Write down the probability generating 
function for the random variable Z, 
the number of shots it takes to hit four 
‘golds’. (2) 
€ Section 7.2 


(E/P) 14 Calls come into a help desk at a rate of 
1.7 per two-minute interval. Given that 
the random variable X is the number 
of calls that come in during a random 
two-minute interval and that the calls 
are independent and random, show, 
from first principles, that the probability 


generating function for X is: 
G,(t) = el -r-1 (5) 


€ Section 7.2 


A discrete random variable X has 
probability generating function 


EP) 15 


Gi0=G—p 
a Find the mean and standard deviation 
of X. (4) 
b Find: 
i P(Y=0) 
ii PLY = 1) (4) 


€ Section 7.3 


(E/P) 16 


A random variable X has a probability 
generating function 


Gy(t) =kP(1 + 3°)? 


a Find the value of k. (2) 
b Find P(X = 4). (2) 
c Use the probability generating 
function to show E(X) = 5 and 
Var(X) =; (6) 
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A second random variable Y has a 


probability generating function 
ee 
GrO= (4+ 9!) 
Given that ¥ and Y are independent, 


d find E(Y) and write down the value of 
E(X + Y). (3) 
< Section 7.3, 7.4 


The probability generating function of a 
discrete random variable X is given by: 
Gt) = k(t + 4 + 28) 
a Show that k= + (2) 
b Find P(X = 3). (2) 
c Show that E(X) = a and find Var(X). 
(6) 
d Find a probability generating function 
of 3X-2. (2) 
< Section 7.3, 7.4 
a Define 
i a Type I error 
ii a Type II error 


(1) 
(1) 
A small aviary, that leaves the eggs 

with the parent birds, rears chicks at an 
average rate of 5 per year. In order to 
increase the number of chicks reared per 
year, it is decided to remove the eggs from 
the aviary as soon as they are laid and put 
them in an incubator. At the end of the 
first year of using an incubator 7 chicks 
had been successfully reared. 


b Assuming that the number of chicks 
reared per year follows a Poisson 
distribution test, at the 5% significance 
level, whether or not there is evidence 
of an increase in the number of chicks 
reared per year. State your hypotheses 
clearly. (4) 

c Calculate the probability of the Type I 
error for this test. (2) 


d Given that the true average number of 
chicks reared per year when the eggs are 
hatched in an incubator is 8, calculate 
the probability of a Type II error. (2) 

< Section 8.1 


ry 19 


EP) 20 


(E/P) 21 


A butter-packing machine cuts butter into 
blocks. The weight of a block of butter is 
normally distributed with a mean weight 
of 250 g and a standard deviation of 4g. 


A random sample of 15 blocks is taken to 

monitor any change in the mean weight 

of the blocks of butter. 

a Find the critical region of a suitable 
test using a 2% level of significance. (4) 

b Assuming the mean weight of a block 
of butter has increased to 254 g, find 
the probability of a Type II error. (2) 

< Section 8.2 


It is suggested that a Poisson distribution 
with parameter A can model the number of 
currants in a currant bun. A random bun 

is selected in order to test the hypotheses 

H,: A = 8 against H,: 4 ¥ 8, using a 10% 

level of significance. 

a Find the critical region for this test, 
such that the probability in each tail is 
as close as possible to 5%. (4) 

b Given that A= 10, find: 

i the probability of a Type II error (2) 
ii the power of the test. (2) 
< Section 8.1, 8.3 


A train company claims that the probability 
p of one of its trains arriving late is 10%. 

A regular traveller on the company’s trains 
believes that the probability is greater than 
10% and decides to test this by randomly 
selecting 12 trains and recording the 
number of trains that were late, X. 

The traveller sets up the hypotheses 

Ho: p= 0.1 and H;: p > 0.1 and accepts the 
null hypothesis if x < 2 


a Find the size of the test. (2) 
b Show that the power function of the 
test is (4) 


1-(1-p)!? (1 + 10p + 55p’). 
c Calculate the power of the test when 
i p=0.2 
ii p = 0.6 


(2) 
(2) 
d Comment on your results from part c. (2) 
< Section 8.1, 8.3 


iy 22 


(E/P) 23 


Review exercise 2 


a Define 
i the power of a test (1) 
ii the size of a test. (1) 


Jane claims that she can read Alan’s 
mind. To test this claim Alan randomly 
chooses a card with one of 4 symbols on 
it. He then concentrates on the symbol. 
Jane then attempts to read Alan’s mind 
by stating what symbol she thinks is on 
the card. The experiment is carried out 

8 times and the number of times, Y, that 
Jane is correct is recorded. 


The probability of Jane stating the 
correct symbol is denoted by p. 

To test the hypothesis H,: p = 0.25 against 
H,: p > 0.25, a critical region of XY > 6is 
used. 

b Find the size of this test. (2) 


c Show that the power function of this 


test is 8p’ — 7p® (4) 
Given that p = 0.3, calculate: 
d the power of this test, (2) 
e the probability of a Type error. (2) 


f Suggest two ways in which you might 
reduce the probability of a Type I 
error. 


(2) 


€ Section 8.1, 8.3, 8.4 


The number of burglaries per year in 

a particular county follows a Poisson 
distribution with mean A per 1000 
households. A police commissioner 
claims that the mean number of 
burglaries per year has decreased. 
Using Hy: A = 4 and H;: 4 < 4 the 
commissioner takes a sample of 1000 
households and rejects H, if there are 2 
or fewer burglaries. If there are 5 or more 
burglaries then Hy is accepted. 


If there are 3 or 4 burglaries, a second 
sample of 1000 households is taken and Hy, 
is rejected if there are 2 or fewer burglaries in 
this second sample, otherwise it is accepted. 


a Find the size of this test. (3) 
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(E/P) 24 


E/P) 25 
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b Show that the power function for this 
test is given by 


altvaed) rele +39 

2A a 

e (isa+4 +e 6 +34 (5) 

c Find the probability of a Type II error 
when 4 = 3. (2) 


€ Section 8.1, 8.3, 8.4 


A drug is claimed to produce a cure to 

a certain disease in 35% of people who 
have the disease. To test this claim a 
sample of 20 people having this disease 
is chosen at random and given the drug. 
If the number of people cured is between 
4 and 10 inclusive, the claim will be 
accepted. Otherwise the claim will not be 
accepted. 


a Write down suitable hypotheses to 
carry out this test. (2) 


b Find the probability of making a Type 
Terror. (2) 


The table below gives the value of the 
probability of the Type IJ error, to 

4 decimal places, for different values of 
p where p is the probability of the drug 
curing a person with the disease. 


P(cure) 0.2 0.3 0.4 0.5 


P(Type I error) | 0.5880 | +r | 0.8565] s 


ce Calculate the value of r and the value 


of s. (2) 
d Calculate the power of the test for 
p=0.2 and p=0.4 (2) 


e Comment, giving your reasons, on the 
suitability of this test procedure. (2) 
< Section 8.1, 8.3 
A manager in a flour mill believes that 
the machines are working incorrectly and 
the proportion p of underweight bags of 
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flour is more than 5%. She decides to test 
this by randomly selecting a sample of 5 
bags and recording the number x that are 
underweight. The manager sets up the 
hypotheses Hy: p = 0.05 and H;: p > 0.05 
and rejects the null hypothesis if x > 1. 


a Find the size of the test. (2) 
b Show that the power function of the 
test is 1 - (1 —p)*(1 + 4p) (3) 


The manager goes on holiday and 

her assistant checks the production 

by randomly selecting a sample of 10 

bags of flour. The assistant rejects the 

hypothesis that p = 0.05 if more than 

2 underweight bags are found in the 

sample. 

c Find the probability of a Type I error 
using the assistant’s test. (2) 

The table below gives some values, to 

2 decimal places, of the power function 

for the assistant’s test. 


7) 0.10 | 0.15 | 0.20 | 0.25 
Power 0.07 | 0.18 | 0.32 a 
d Find the value of a. (3) 


e On the same axes, draw the graph of 
the power function for the manager’s 
and the assistant’s tests. (4) 


f Given that p > 0.2, state, with a reason, 
which test you would recommend. (2) 


The assistant suggests that they should 
use his sampling method rather than the 
manager’s. 
g Give two reasons why the manager 
might not agree to this change. (2) 
< Section 8.1, 8.3, 8.4 


Review exercise 2 


Challenge 


1 A manufacturer claims that the batteries used in his mobile phones have a mean lifetime of 360 hours 
and a standard deviation of 20 hours, when the phone is left on standby. To test this claim 100 phones 
were left on standby until the batteries ran flat. The lifetime ¢ hours of the batteries was recorded. 
The results are as follows. 


t 300= | 320= | 340= | Se0= | B60= | S70= | Be0— | ACo- 
Frequency 1 9 28 20 16 18 uf 1 


A researcher believes that a normal distribution might provide a good model for the lifetime of the 
batteries 


She calculated the expected frequencies as follows using the distribution N (360, 20). 


t <A) || 320= | S40— | 3E0> | se || 3Sv0= | se0= | Aoe- 
Expected frequency | 2.28 13.59 | 24.26 r § WA) || 13.59) Efe 


a Find the values ofr and s. 
b Stating clearly your hypotheses, test, at the 1% level of significance, whether or not this normal 
distribution is a suitable model for these data. 
< Section 6.4 


You can use the following rule to find the distribution of a sum of N identical independent random 
variables, where N is itself an independent random variable: 


X,, Xo, X3, ... are identically distributed independent random variables, each with probability generating 
function G,(¢), and N is a random variable with probability generating function G,/(0). 
N 


i S= we + X,+ ...+ Xy then the probability generating function of S is given by Gs(t) = Gy((Gy(). 
i=l 


a Use the above result to show further that: 
Hh eS) S EAVES) 
ii Var(S) = E(V)Var(X) + (E(X))? Var(V) 


A bank models the number of people who use its external cash machine each hour as a Poisson random 
variable Po(A). Each person who uses the cash machine makes a balance enquiry with probability p. 


b Show that the total number of balance enquiries made at the external cash machine each hour also 
has a Poisson distribution and determine its parameter. 


The bank models the number of people who use its internal cash machine each hour as a binomial 

random variable B(, q), where n is the number of customers who visit the bank each hour, and q is the 

probability that each customer uses the machine. Given that each of these people also makes a balance 

enquiry with probability p, 

c show that the total number of balance enquiries made at the internal cash machine also has a 
binomial distribution, and determine its parameters. 


Given further that A = 75, nm = 80 and q = 0.25, and that each person who uses a cash machine withdraws 
either £0, £10, £20 or £50 with probabilities 0.1, 0.3, 0.4 and 0.2 respectively, 
d_ find the mean and standard deviation of the total amount of money withdrawn each hour at 
i the external cash machine 
ii the internal cash machine. 
<€ Sections 7.1, 7.2, 7.3, 7.4 
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Exam-style practice 


Further Mathematics 
AS Level 
Further Statistics 1 


Time: 50 minutes 
You must have: Mathematical Formulae and Statistical Tables, Calculator 


1 The discrete random variable X has probability distribution given by: 
x =2 =] 0 1 2 

P(Xx=x) | 01 | a | 0.15] 02 | 6b 

The random variable Y = 2X + 3. Given that E( Y) = 4.48, 

a find the values of a and b 


b calculate the exact value of Var(X) 
c find P(Y-2 > X). 


(5) 
(3) 
(2) 


2 A call centre receives calls about insurance at a rate of 3.2 per ten-minute interval and calls about 
utility bills at a rate of 4.1 per ten-minute interval. Calls about insurance and calls about utility 


bills are independent of each other. 


a Ina ten-minute interval, calculate the probability that the company receives exactly 3 calls of 


each type. 


(2) 


b Ina ten-minute interval, calculate the probability that the company receives at least 7 calls in 


total. 


(2) 


c Ina one-hour period, calculate the probability that the company receives fewer than 45 calls in 


total. 


(2) 


3 A sports club collects data on the gender of its members and the sport that they play. A random 


sample of 250 members is taken and the data is recorded in the table below: 


Hockey Cricket Squash 
Male 61 45 32 
Female 66 23 23 
A test is to be carried out at a 2.5% level of significance to determine whether or not there is 
an association between gender and choice of sport. 


a Write down suitable null and alternative hypotheses. 

b Calculate the test statistic for this test. 

c State the number of degrees of freedom of the test. 

d State whether or not the null hypothesis is accepted. Give a reason for your answer. 

e State the effect on your answer to part dif the test was carried out at the 5% level of 
significance. 


180 


(2) 
(4) 
(1) 
(2) 


(1) 


Exam-style practice 


4 A large pottery believes that 0.5% of the bowls that they make contain a defect. A quality 
control manager takes a random sample of 750 bowls. 


a Find the mean and variance of the number of bowls in the sample with a defect. (2) 


b By using a Poisson approximation, estimate the probability that more than three bowls in 


the sample have a defect. 
c Give a reason to support the use of a Poisson approximation. 


(2) 
(1) 


5 Jennifer spins four identical coins 100 times and records the number of heads each time. 


The results are shown in the table below. 


Number of heads 


0 


1 


2 


3 


4 


Frequency 


6 


18 


35 


26 


15 


a Use these results to estimate the probability of any single coin landing on heads. (2) 


She believes that the binomial distribution is a suitable model for the number of heads. 
Using your answer to part a, 


b Carry out a test at the 10% level of significance to check Jennifer’s claim. You must state 


your hypotheses clearly. 


(7) 
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Further Mathematics 
A Level 
Further Statistics 1 


Time: 1 hour and 30 minutes 
You must have: Mathematical Formulae and Statistical Tables, Calculator 


1 Johsva works in a call centre. He calls people from a list until the first person answers. Over the 
course of a week, he records the number of calls he has to make before a person answers. The 
results are shown in the frequency table below: 

Number of calls 1 2 3 4 5 6 

Frequency 52 31 12 7 1 1 


Joshva believes that the distribution of the random variable ‘number of calls made until 
someone answers’ is Geo(0.4). 


Test, at the 5% level of significance, whether Joshva’s belief is correct. You must state your 
hypotheses clearly. (10) 
2 The probability generating function of the discrete random variable X is given by: 
Gy=k(1 + 2t+ 327)? 


a Show that k = ve (2) 
b Find P(X = 2). (2) 
c Show that Var(X) = (8) 
d Write down a probability generating function for Y= 2X + 3. (2) 


3 Jagdeep is practising darts. He continues to throw darts at the board until he hits the bullseye 
r times. The random variable Y represents the total number of darts he throws. Given that the 
mean and variance of Y are 20 and 374 respectively, 

a write down a suitable model for this situation and state two assumptions that must be made 


for it to be valid. (3) 
b Find the value of p, the probability of Jagdeep hitting the bullseye with any one dart. (4) 
c Find the value of r. (1) 


4 The number of flaws, X, in a ten-metre length of cloth is modelled as Y ~ Po(2.1). 

A random sample of 200 ten-metre lengths of the cloth is taken. 

a find the probability that the sample mean, _Y, is greater than 2.3. (4) 

A tailor decides to modify the production process to reduce the rate of appearance of flaws. 

After this modification, he takes a random sample of twenty metres of cloth and finds that there 

is one flaw in it. 

b Test, at the 5% significance level, whether there is evidence that the rate of appearance of flaws 
has been reduced. (5) 
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5 The discrete random variable X has a probability distribution as shown in the table below. 


x || ee 0 2 3 5 
P(X=x) | P| 4 r q q F 
The random variable Y is defined as Y = 2X + 5. Given that E(Y) = 4.9 and that 
P(Y < 9) = 0.55, find: 

a the values of p, g andr 

b P(Y >2Y-3). 


Exam-style practice 


(7) 
(2) 


6 A footballer is practising penalty kicks. He takes 500 penalties per day and the probability of a 


random penalty missing the goal is p. 


The probability that the footballer never misses the goal in four consecutive kicks is 0.8853. 


a Find the mean and variance of the number of missed penalties each day. 


(7) 


b Explain why a Poisson approximation can be used to find the probability that the footballer 


misses more than r penalties per day. 


(2) 


c Usea suitable Poisson approximation to find the probability that the footballer misses more 


than 17 penalties on a given day. 
d Comment on the accuracy of the approximation obtained in part ce. 


7 Philip is rolling a six-sided dice to test if it is biased against rolling a six. 
He rolls the dice 25 times and records the number of times a six appears. 


a Using a 10% significance level, find the critical region for Philip’s test and write down the 


size of the test. 
b Show that the power function for Philip’s test is given by 


(1 — p)*(1 + 24p) 


Gemma carries out a different experiment such that she continues to roll the dice until a six 


appears. 
Given that the critical region for Gemma’s test is ‘greater than or equal to 13’, 
c find the size of Gemma’s test and write down the power function for her test. 


d Give two reasons why you would recommend Philip’s test over Gemma’s test when 
p= 0.09. 


(2) 
(2) 


(3) 


(3) 


(3) 


(3) 
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The tabulated value is PLY < x), where X has a binomial distribution with index n and parameter p. 


BINOMIAL CUMULATIVE DISTRIBUTION FUNCTION 


Appendix 


SS 
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0.35 
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0.45 
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0.7738 


0.9774 
0.9988 
1.0000 
1.0000 


0.5905 


0.9185 
0.9914 
0.9995 
1.0000 


0.4437 


0.8352 
0.9734 
0.9978 
0.9999 


0.3277 


0.7373 
0.9421 
0.9933 
0.9997 


0.2373 


0.6328 
0.8965 
0.9844 
0.9990 


0.1681 


0.5282 
0.8369 
0.9692 
0.9976 


0.1160 


0.4284 
0.7648 
0.9460 
0.9947 


0.0778 


0.3370 
0.6826 
0.9130 
0.9898 


0.0503 


0.2562 
0.5931 
0.8688 
0.9815 


0.0312 


0.1875 
0.5000 
0.8125 
0.9688 


0.7351 


0.9672 
0.9978 
0.9999 
1.0000 
1.0000 


0.5314 


0.8857 
0.9842 
0.9987 
0.9999 
1.0000 


0.3771 


0.7765 
0.9527 
0.9941 
0.9996 
1.0000 


0.2621 


0.6554 
0.9011 
0.9830 
0.9984 
0.9999 


0.1780 


0.5339 
0.8306 
0.9624 
0.9954 
0.9998 


0.1176 


0.4202 
0.7443 
0.9295 
0.9891 
0.9993 


0.0754 


0.3191 
0.6471 
0.8826 
0.9777 
0.9982 


0.0467 


0.2333 
0.5443 
0.8208 
0.9590 
0.9959 


0.0277 


0.1636 
0.4415 
0.7447 
0.9308 
0.9917 


0.0156 


0.1094 
0.3438 
0.6563 
0.8906 
0.9844 


n=7,x= 


SolnBRWN 


0.6983 


0.9556 
0.9962 
0.9998 
1.0000 
1.0000 


1.0000 


0.4783 


0.8503 
0.9743 
0.9973 
0.9998 
1.0000 


1.0000 


0.3206 


0.7166 
0.9262 
0.9879 
0.9988 
0.9999 


1.0000 


0.2097 


0.5767 
0.8520 
0.9667 
0.9953 
0.9996 


1.0000 


0.1335 


0.4449 
0.7564 
0.9294 
0.9871 
0.9987 


0.9999 


0.0824 


0.3294 
0.6471 
0.8740 
0.9712 
0.9962 


0.9998 


0.0490 


0.2338 
0.5323 
0.8002 
0.9444 
0.9910 


0.9994 


0.0280 


0.1586 
0.4199 
0.7102 
0.9037 
0.9812 


0.9984 


0.0152 


0.1024 
0.3164 
0.6083 
0.8471 
0.9643 


0.9963 


0.0078 


0.0625 
0.2266 
0.5000 
0.7734 
0.9375 


0.9922 


SOlnDn UNBRWN 


0.6634 


0.9428 
0.9942 
0.9996 
1.0000 
1.0000 


1.0000 
1.0000 


0.4305 


0.8131 
0.9619 
0.9950 
0.9996 
1.0000 


1.0000 
1.0000 


0.2725 


0.6572 
0.8948 
0.9786 
0.9971 
0.9998 


1.0000 
1.0000 


0.1678 


0.5033 
0.7969 
0.9437 
0.9896 
0.9988 


0.9999 
1.0000 


0.1001 


0.3671 
0.6785 
0.8862 
0.9727 
0.9958 


0.9996 
1.0000 


0.0576 


0.2553 
0.5518 
0.8059 
0.9420 
0.9887 


0.9987 
0.9999 


0.0319 


0.1691 
0.4278 
0.7064 
0.8939 
0.9747 


0.9964 
0.9998 


0.0168 


0.1064 
0.3154 
0.5941 
0.8263 
0.9502 


0.9915 
0.9993 


0.0084 


0.0632 
0.2201 
0.4770 
0.7396 
0.9115 


0.9819 
0.9983 


0.0039 


0.0352 
0.1445 
0.3633 
0.6367 
0.8555 


0.9648 
0.9961 


SI|AD AKWN eK 


0.6302 


0.9288 
0.9916 
0.9994 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.3874 


0.7748 
0.9470 
0.9917 
0.9991 
0.9999 


1.0000 
1.0000 
1.0000 


0.2316 


0.5995 
0.8591 
0.9661 
0.9944 
0.9994 


1.0000 
1.0000 
1.0000 


0.1342 


0.4362 
0.7382 
0.9144 
0.9804 
0.9969 


0.9997 
1.0000 
1.0000 


0.0751 


0.3003 
0.6007 
0.8343 
0.9511 
0.9900 


0.9987 
0.9999 
1.0000 


0.0404 


0.1960 
0.4628 
0.7297 
0.9012 
0.9747 


0.9957 
0.9996 
1.0000 


0.0207 


0.1211 
0.3373 
0.6089 
0.8283 
0.9464 


0.9888 
0.9986 
0.9999 


0.0101 


0.0705 
0.2318 
0.4826 
0.7334 
0.9006 


0.9750 
0.9962 
0.9997 


0.0046 


0.0385 
0.1495 
0.3614 
0.6214 
0.8342 


0.9502 
0.9909 
0.9992 


0.0020 


0.0195 
0.0898 
0.2539 
0.5000 
0.7461 


0.9102 
0.9805 
0.9980 


n=10,x = 


S| OND NP WN Ke 


COMONAND APWN Ke 


0.5987 


0.9139 
0.9885 
0.9990 
0.9999 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.3487 


0.7361 
0.9298 
0.9872 
0.9984 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 


0.1969 


0.5443 
0.8202 
0.9500 
0.9901 
0.9986 


0.9999 
1.0000 
1.0000 
1.0000 


0.1074 


0.3758 
0.6778 
0.8791 
0.9672 
0.9936 


0.9991 
0.9999 
1.0000 
1.0000 


0.0563 


0.2440 
0.5256 
0.7759 
0.9219 
0.9803 


0.9965 
0.9996 
1.0000 
1.0000 


0.0282 


0.1493 
0.3828 
0.6496 
0.8497 
0.9527 


0.9894 
0.9984 
0.9999 
1.0000 


0.0135 


0.0860 
0.2616 
0.5138 
0.7515 
0.9051 


0.9740 
0.9952 
0.9995 
1.0000 


0.0060 


0.0464 
0.1673 
0.3823 
0.6331 
0.8338 


0.9452 
0.9877 
0.9983 
0.9999 


0.0025 


0.0233 
0.0996 
0.2660 
0.5044 
0.7384 


0.8980 
0.9726 
0.9955 
0.9997 


0.0010 


0.0107 
0.0547 
0.1719 
0.3770 
0.6230 


0.8281 
0.9453 
0.9893 
0.9990 
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0.5404 


0.8816 
0.9804 
0.9978 
0.9998 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 


0.2824 


0.6590 
0.8891 
0.9744 
0.9957 
0.9995 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 


0.1422 


0.4435 
0.7358 
0.9078 
0.9761 
0.9954 


0.9993 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 


0.0687 


0.2749 
0.5583 
0.7946 
0.9274 
0.9806 


0.9961 
0.9994 
0.9999 
1.0000 
1.0000 


1.0000 


0.0317 


0.1584 
0.3907 
0.6488 
0.8424 
0.9456 


0.9857 
0.9972 
0.9996 
1.0000 
1.0000 


1.0000 


0.0138 


0.0850 
0.2528 
0.4925 
0.7237 
0.8822 


0.9614 
0.9905 
0.9983 
0.9998 
1.0000 


1.0000 


0.0057 


0.0424 
0.1513 
0.3467 
0.5833 
0.7873 


0.9154 
0.9745 
0.9944 
0.9992 
0.9999 


1.0000 


0.0022 


0.0196 
0.0834 
0.2253 
0.4382 
0.6652 


0.8418 
0.9427 
0.9847 
0.9972 
0.9997 


1.0000 


0.0008 


0.0083 
0.0421 
0.1345 
0.3044 
0.5269 


0.7393 
0.8883 
0.9644 
0.9921 
0.9989 


0.9999 


0.0002 


0.0032 
0.0193 
0.0730 
0.1938 
0.3872 


0.6128 
0.8062 
0.9270 
0.9807 
0.9968 


0.9998 
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0.4633 


0.8290 
0.9638 
0.9945 
0.9994 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.2059 


0.5490 
0.8159 
0.9444 
0.9873 
0.9978 


0.9997 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0874 


0.3186 
0.6042 
0.8227 
0.9383 
0.9832 


0.9964 
0.9994 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0352 


0.1671 
0.3980 
0.6482 
0.8358 
0.9389 


0.9819 
0.9958 
0.9992 
0.9999 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0134 


0.0802 
0.2361 
0.4613 
0.6865 
0.8516 


0.9434 
0.9827 
0.9958 
0.9992 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 


0.0047 


0.0353 
0.1268 
0.2969 
0.5155 
0.7216 


0.8689 
0.9500 
0.9848 
0.9963 
0.9993 


0.9999 
1.0000 
1.0000 
1.0000 


0.0016 


0.0142 
0.0617 
0.1727 
0.3519 
0.5643 


0.7548 
0.8868 
0.9578 
0.9876 
0.9972 


0.9995 
0.9999 
1.0000 
1.0000 


0.0005 


0.0052 
0.0271 
0.0905 
0.2173 
0.4032 


0.6098 
0.7869 
0.9050 
0.9662 
0.9907 


0.9981 
0.9997 
1.0000 
1.0000 


0.0001 


0.0017 
0.0107 
0.0424 
0.1204 
0.2608 


0.4522 
0.6535 
0.8182 
0.9231 
0.9745 


0.9937 
0.9989 
0.9999 
1.0000 


0.0000 


0.0005 
0.0037 
0.0176 
0.0592 
0.1509 


0.3036 
0.5000 
0.6964 
0.8491 
0.9408 


0.9824 
0.9963 
0.9995 
1.0000 


0.3585 


0.7358 
0.9245 
0.9841 
0.9974 
0.9997 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.1216 


0.3917 
0.6769 
0.8670 
0.9568 
0.9887 


0.9976 
0.9996 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0388 


0.1756 
0.4049 
0.6477 
0.8298 
0.9327 


0.9781 
0.9941 
0.9987 
0.9998 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0115 


0.0692 
0.2061 
0.4114 
0.6296 
0.8042 


0.9133 
0.9679 
0.9900 
0.9974 
0.9994 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0032 


0.0243 
0.0913 
0.2252 
0.4148 
0.6172 


0.7858 
0.8982 
0.9591 
0.9861 
0.9961 


0.9991 
0.9998 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0008 


0.0076 
0.0355 
0.1071 
0.2375 
0.4164 


0.6080 
0.7723 
0.8867 
0.9520 
0.9829 


0.9949 
0.9987 
0.9997 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0002 


0.0021 
0.0121 
0.0444 
0.1182 
0.2454 


0.4166 
0.6010 
0.7624 
0.8782 
0.9468 


0.9804 
0.9940 
0.9985 
0.9997 
1.0000 


1.0000 
1.0000 
1.0000 


0.0000 


0.0005 
0.0036 
0.0160 
0.0510 
0.1256 


0.2500 
0.4159 
0.5956 
0.7553 
0.8725 


0.9435 
0.9790 
0.9935 
0.9984 
0.9997 


1.0000 
1.0000 
1.0000 


0.0000 


0.0001 
0.0009 
0.0049 
0.0189 
0.0553 


0.1299 
0.2520 
0.4143 
0.5914 
0.7507 


0.8692 
0.9420 
0.9786 
0.9936 
0.9985 


0.9997 
1.0000 
1.0000 


0.0000 


0.0000 
0.0002 
0.0013 
0.0059 
0.0207 


0.0577 
0.1316 
0.2517 
0.4119 
0.5881 


0.7483 
0.8684 
0.9423 
0.9793 
0.9941 


0.9987 
0.9998 
1.0000 
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0.2774 


0.6424 
0.8729 
0.9659 
0.9928 
0.9988 


0.9998 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0718 


0.2712 
0.5371 
0.7636 
0.9020 
0.9666 


0.9905 
0.9977 
0.9995 
0.9999 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0172 


0.0931 
0.2537 
0.4711 
0.6821 
0.8385 


0.9305 
0.9745 
0.9920 
0.9979 
0.9995 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0038 


0.0274 
0.0982 
0.2340 
0.4207 
0.6167 


0.7800 
0.8909 
0.9532 
0.9827 
0.9944 


0.9985 
0.9996 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0008 


0.0070 
0.0321 
0.0962 
0.2137 
0.3783 


0.5611 
0.7265 
0.8506 
0.9287 
0.9703 


0.9893 
0.9966 
0.9991 
0.9998 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0001 


0.0016 
0.0090 
0.0332 
0.0905 
0.1935 


0.3407 
0.5118 
0.6769 
0.8106 
0.9022 


0.9558 
0.9825 
0.9940 
0.9982 
0.9995 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0000 


0.0003 
0.0021 
0.0097 
0.0320 
0.0826 


0.1734 
0.3061 
0.4668 
0.6303 
0.7712 


0.8746 
0.9396 
0.9745 
0.9907 
0.9971 


0.9992 
0.9998 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0000 


0.0001 
0.0004 
0.0024 
0.0095 
0.0294 


0.0736 
0.1536 
0.2735 
0.4246 
0.5858 


0.7323 
0.8462 
0.9222 
0.9656 
0.9868 


0.9957 
0.9988 
0.9997 
0.9999 
1.0000 


1.0000 
1.0000 


0.0000 


0.0000 
0.0001 
0.0005 
0.0023 
0.0086 


0.0258 
0.0639 
0.1340 
0.2424 
0.3843 


0.5426 
0.6937 
0.8173 
0.9040 
0.9560 


0.9826 
0.9942 
0.9984 
0.9996 
0.9999 


1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0001 
0.0005 
0.0020 


0.0073 
0.0216 
0.0539 
0.1148 
0.2122 


0.3450 
0.5000 
0.6550 
0.7878 
0.8852 


0.9461 
0.9784 
0.9927 
0.9980 
0.9995 


0.9999 
1.0000 


0.2146 


0.5535 
0.8122 
0.9392 
0.9844 
0.9967 


0.9994 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


0.0424 


0.1837 
0.4114 
0.6474 
0.8245 
0.9268 


0.9742 
0.9922 
0.9980 
0.9995 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


0.0076 


0.0480 
0.1514 
0.3217 
0.5245 
0.7106 


0.8474 
0.9302 
0.9722 
0.9903 
0.9971 


0.9992 
0.9998 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


0.0012 


0.0105 
0.0442 
0.1227 
0.2552 
0.4275 


0.6070 
0.7608 
0.8713 
0.9389 
0.9744 


0.9905 
0.9969 
0.9991 
0.9998 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


0.0002 


0.0020 
0.0106 
0.0374 
0.0979 
0.2026 


0.3481 
0.5143 
0.6736 
0.8034 
0.8943 


0.9493 
0.9784 
0.9918 
0.9973 
0.9992 


0.9998 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


0.0000 


0.0003 
0.0021 
0.0093 
0.0302 
0.0766 


0.1595 
0.2814 
0.4315 
0.5888 
0.7304 


0.8407 
0.9155 
0.9599 
0.9831 
0.9936 


0.9979 
0.9994 
0.9998 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0003 
0.0019 
0.0075 
0.0233 


0.0586 
0.1238 
0.2247 
0.3575 
0.5078 


0.6548 
0.7802 
0.8737 
0.9348 
0.9699 


0.9876 
0.9955 
0.9986 
0.9996 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0003 
0.0015 
0.0057 


0.0172 
0.0435 
0.0940 
0.1763 
0.2915 


0.4311 
0.5785 
0.7145 
0.8246 
0.9029 


0.9519 
0.9788 
0.9917 
0.9971 
0.9991 


0.9998 
1.0000 
1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0002 
0.0011 


0.0040 
0.0121 
0.0312 
0.0694 
0.1350 


0.2327 
0.3592 
0.5025 
0.6448 
0.7691 


0.8644 
0.9286 
0.9666 
0.9862 
0.9950 


0.9984 
0.9996 
0.9999 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 
0.0002 


0.0007 
0.0026 
0.0081 
0.0214 
0.0494 


0.1002 
0.1808 
0.2923 
0.4278 
0.5722 


0.7077 
0.8192 
0.8998 
0.9506 
0.9786 


0.9919 
0.9974 
0.9993 
0.9998 
1.0000 
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0.1285 


0.3991 
0.6767 
0.8619 
0.9520 


0.9861 


0.9966 
0.9993 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0148 


0.0805 
0.2228 
0.4231 
0.6290 


0.7937 


0.9005 
0.9581 
0.9845 
0.9949 
0.9985 


0.9996 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0015 


0.0121 
0.0486 
0.1302 
0.2633 


0.4325 


0.6067 
0.7559 
0.8646 
0.9328 
0.9701 


0.9880 
0.9957 
0.9986 
0.9996 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0001 


0.0015 
0.0079 
0.0285 
0.0759 


0.1613 


0.2859 
0.4371 
0.5931 
0.7318 
0.8392 


0.9125 
0.9568 
0.9806 
0.9921 
0.9971 


0.9990 
0.9997 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0000 


0.0001 
0.0010 
0.0047 
0.0160 


0.0433 


0.0962 
0.1820 
0.2998 
0.4395 
0.5839 


0.7151 
0.8209 
0.8968 
0.9456 
0.9738 


0.9884 
0.9953 
0.9983 
0.9994 
0.9998 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0000 


0.0000 
0.0001 
0.0006 
0.0026 


0.0086 


0.0238 
0.0553 
0.1110 
0.1959 
0.3087 


0.4406 
0.5772 
0.7032 
0.8074 
0.8849 


0.9367 
0.9680 
0.9852 
0.9937 
0.9976 


0.9991 
0.9997 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0001 
0.0003 


0.0013 


0.0044 
0.0124 
0.0303 
0.0644 
0.1215 


0.2053 
0.3143 
0.4408 
0.5721 
0.6946 


0.7978 
0.8761 
0.9301 
0.9637 
0.9827 


0.9925 
0.9970 
0.9989 
0.9996 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 


0.0001 


0.0006 
0.0021 
0.0061 
0.0156 
0.0352 


0.0709 
0.1285 
0.2112 
0.3174 
0.4402 


0.5681 
0.6885 
0.7911 
0.8702 
0.9256 


0.9608 
0.9811 
0.9917 
0.9966 
0.9988 


0.9996 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 


0.0000 


0.0001 
0.0002 
0.0009 
0.0027 
0.0074 


0.0179 
0.0386 
0.0751 
0.1326 
0.2142 


0.3185 
0.4391 
0.5651 
0.6844 
0.7870 


0.8669 
0.9233 
0.9595 
0.9804 
0.9914 


0.9966 
0.9988 
0.9996 
0.9999 
1.0000 


1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 


0.0000 


0.0000 
0.0000 
0.0001 
0.0003 
0.0011 


0.0032 
0.0083 
0.0192 
0.0403 
0.0769 


0.1341 
0.2148 
0.3179 
0.4373 
0.5627 


0.6821 
0.7852 
0.8659 
0.9231 
0.9597 


0.9808 
0.9917 
0.9968 
0.9989 
0.9997 


0.9999 
1.0000 
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0.0769 


0.2794 
0.5405 
0.7604 
0.8964 
0.9622 


0.9882 
0.9968 
0.9992 
0.9998 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0052 


0.0338 
0.1117 
0.2503 
0.4312 
0.6161 


0.7702 
0.8779 
0.9421 
0.9755 
0.9906 


0.9968 
0.9990 
0.9997 
0.9999 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0003 


0.0029 
0.0142 
0.0460 
0.1121 
0.2194 


0.3613 
0.5188 
0.6681 
0.7911 
0.8801 


0.9372 
0.9699 
0.9868 
0.9947 
0.9981 


0.9993 
0.9998 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0000 


0.0002 
0.0013 
0.0057 
0.0185 
0.0480 


0.1034 
0.1904 
0.3073 
0.4437 
0.5836 


0.7107 
0.8139 
0.8894 
0.9393 
0.9692 


0.9856 
0.9937 
0.9975 
0.9991 
0.9997 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0001 
0.0005 
0.0021 
0.0070 


0.0194 
0.0453 
0.0916 
0.1637 
0.2622 


0.3816 
0.5110 
0.6370 
0.7481 
0.8369 


0.9017 
0.9449 
0.9713 
0.9861 
0.9937 


0.9974 
0.9990 
0.9996 
0.9999 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0002 
0.0007 


0.0025 
0.0073 
0.0183 
0.0402 
0.0789 


0.1390 
0.2229 
0.3279 
0.4468 
0.5692 


0.6839 
0.7822 
0.8594 
0.9152 
0.9522 


0.9749 
0.9877 
0.9944 
0.9976 
0.9991 


0.9997 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 
0.0001 


0.0002 
0.0008 
0.0025 
0.0067 
0.0160 


0.0342 
0.0661 
0.1163 
0.1878 
0.2801 


0.3889 
0.5060 
0.6216 
0.7264 
0.8139 


0.8813 
0.9290 
0.9604 
0.9793 
0.9900 


0.9955 
0.9981 
0.9993 
0.9997 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 
0.0000 


0.0000 
0.0001 
0.0002 
0.0008 
0.0022 


0.0057 
0.0133 
0.0280 
0.0540 
0.0955 


0.1561 
0.2369 
0.3356 
0.4465 
0.5610 


0.6701 
0.7660 
0.8438 
0.9022 
0.9427 


0.9686 
0.9840 
0.9924 
0.9966 
0.9986 


0.9995 
0.9998 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 
0.0000 


0.0000 
0.0000 
0.0000 
0.0001 
0.0002 


0.0006 
0.0018 
0.0045 
0.0104 
0.0220 


0.0427 
0.0765 
0.1273 
0.1974 
0.2862 


0.3900 
0.5019 
0.6134 
0.7160 
0.8034 


0.8721 
0.9220 
0.9556 
0.9765 
0.9884 


0.9947 
0.9978 
0.9991 
0.9997 
0.9999 


1.0000 
1.0000 
1.0000 


0.0000 


0.0000 
0.0000 
0.0000 
0.0000 
0.0000 


0.0000 
0.0000 
0.0000 
0.0000 
0.0000 


0.0000 
0.0002 
0.0005 
0.0013 
0.0033 


0.0077 
0.0164 
0.0325 
0.0595 
0.1013 


0.1611 
0.2399 
0.3359 
0.4439 
0.5561 


0.6641 
0.7601 
0.8389 
0.8987 
0.9405 


0.9675 
0.9836 
0.9923 
0.9967 
0.9987 


0.9995 
0.9998 
1.0000 


Appendix 


The values z in the table are those which a random variable Z ~ N(0, 1) exceeds with probability p; that is, 


PERCENTAGE POINTS OF THE NORMAL DISTRIBUTION 


P(Z>z)=1- O(zZ)=p. 


Pp z Pp z 
0.5000 0.0000 0.0500 1.6449 
0.4000 0.2533 0.0250 1.9600 
0.3000 0.5244 0.0100 2.3263 
0.2000 0.8416 0.0050 2.5758 
0.1500 1.0364 0.0010 3.0902 
0.1000 1.2816 0.0005 3.2905 
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POISSON CUMULATIVE DISTRIBUTION FUNCTION 


The tabulated value is PLY < x), where X has a Poisson distribution with parameter A. 


Appendix 
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0.6065 


0.9098 
0.9856 
0.9982 
0.9998 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.3679 


0.7358 
0.9197 
0.9810 
0.9963 
0.9994 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.2231 


0.5578 
0.8088 
0.9344 
0.9814 
0.9955 


0.9991 
0.9998 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.1353 


0.4060 
0.6767 
0.8571 
0.9473 
0.9834 


0.9955 
0.9989 
0.9998 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0821 


0.2873 
0.5438 
0.7576 
0.8912 
0.9580 


0.9858 
0.9958 
0.9989 
0.9997 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0498 


0.1991 
0.4232 
0.6472 
0.8153 
0.9161 


0.9665 
0.9881 
0.9962 
0.9989 
0.9997 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0302 


0.1359 
0.3208 
0.5366 
0.7254 
0.8576 


0.9347 
0.9733 
0.9901 
0.9967 
0.9990 


0.9997 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0183 


0.0916 
0.2381 
0.4335 
0.6288 
0.7851 


0.8893 
0.9489 
0.9786 
0.9919 
0.9972 


0.9991 
0.9997 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0111 


0.0611 
0.1736 
0.3423 
0.5321 
0.7029 


0.8311 
0.9134 
0.9597 
0.9829 
0.9933 


0.9976 
0.9992 
0.9997 
0.9999 
1.0000 


1.0000 
1.0000 
1.0000 
1.0000 


0.0067 


0.0404 
0.1247 
0.2650 
0.4405 
0.6160 


0.7622 
0.8666 
0.9319 
0.9682 
0.9863 


0.9945 
0.9980 
0.9993 
0.9998 
0.9999 


1.0000 
1.0000 
1.0000 
1.0000 
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0.0041 


0.0266 
0.0884 
0.2017 
0.3575 
0.5289 


0.6860 
0.8095 
0.8944 
0.9462 
0.9747 


0.9890 
0.9955 
0.9983 
0.9994 
0.9998 


0.9999 
1.0000 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0025 


0.0174 
0.0620 
0.1512 
0.2851 
0.4457 


0.6063 
0.7440 
0.8472 
0.9161 
0.9574 


0.9799 
0.9912 
0.9964 
0.9986 
0.9995 


0.9998 
0.9999 
1.0000 
1.0000 
1.0000 


1.0000 
1.0000 


0.0015 


0.0113 
0.0430 
0.1118 
0.2237 
0.3690 


0.5265 
0.6728 
0.7916 
0.8774 
0.9332 


0.9661 
0.9840 
0.9929 
0.9970 
0.9988 


0.9996 
0.9998 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 


0.0009 


0.0073 
0.0296 
0.0818 
0.1730 
0.3007 


0.4497 
0.5987 
0.7291 
0.8305 
0.9015 


0.9467 
0.9730 
0.9872 
0.9943 
0.9976 


0.9990 
0.9996 
0.9999 
1.0000 
1.0000 


1.0000 
1.0000 


0.0006 


0.0047 
0.0203 
0.0591 
0.1321 
0.2414 


0.3782 
0.5246 
0.6620 
0.7764 
0.8622 


0.9208 
0.9573 
0.9784 
0.9897 
0.9954 


0.9980 
0.9992 
0.9997 
0.9999 
1.0000 


1.0000 
1.0000 


0.0003 


0.0030 
0.0138 
0.0424 
0.0996 
0.1912 


0.3134 
0.4530 
0.5925 
0.7166 
0.8159 


0.8881 
0.9362 
0.9658 
0.9827 
0.9918 


0.9963 
0.9984 
0.9993 
0.9997 
0.9999 


1.0000 
1.0000 


0.0002 


0.0019 
0.0093 
0.0301 
0.0744 
0.1496 


0.2562 
0.3856 
0.5231 
0.6530 
0.7634 


0.8487 
0.9091 
0.9486 
0.9726 
0.9862 


0.9934 
0.9970 
0.9987 
0.9995 
0.9998 


0.9999 
1.0000 


0.0001 


0.0012 
0.0062 
0.0212 
0.0550 
0.1157 


0.2068 
0.3239 
0.4557 
0.5874 
0.7060 


0.8030 
0.8758 
0.9261 
0.9585 
0.9780 


0.9889 
0.9947 
0.9976 
0.9989 
0.9996 


0.9998 
0.9999 


0.0001 


0.0008 
0.0042 
0.0149 
0.0403 
0.0885 


0.1649 
0.2687 
0.3918 
0.5218 
0.6453 


0.7520 
0.8364 
0.8981 
0.9400 
0.9665 


0.9823 
0.9911 
0.9957 
0.9980 
0.9991 


0.9996 
0.9999 


0.0000 


0.0005 
0.0028 
0.0103 
0.0293 
0.0671 


0.1301 
0.2202 
0.3328 
0.4579 
0.5830 


0.6968 
0.7916 
0.8645 
0.9165 
0.9513 


0.9730 
0.9857 
0.9928 
0.9965 
0.9984 


0.9993 
0.9997 
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Appendix 


PERCENTAGE POINTS OF THE y? DISTRIBUTION 


The values in the table are those which a random variable with the y? distribution on v degrees of freedom 


exceeds with the probability shown. 


v 0.995 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010 0.005 
1 0.000 0.000 0.001 0.004 0.016 2.705 3.841 5.024 6.635 7.879 
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 
3 0.072 O.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 
4 0.207 0.297 0.484 0.711 = 1.064 7.779 9488 11.143 13.277 14.860 
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086 16.750 
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14449 16.812 18.548 
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 

10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 

11 2.603 3.053 3.816 4575 5.580 17.275 19.675 21.920 24.725 26.757 

12 3.074 3.571 4404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 

13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 

14 4.075 4.660 5.629 6571 7.790 21.064 23.685 26.119 29.141 31.319 

15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 

16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 

17 5.697 6408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 

18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 

19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582 

20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 

21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 

22 8.643 9.542 10.982 12.338 14.042 30.813 33.924 36.781 40.289 42.796 

23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 

24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.558 

25 | 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928 

26 | 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 

27 | 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.194 46.963 49.645 

28 | 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 

29 | 13.121 14.256 16.047 17.708 19.768 39.088 42.557 45.722 49.588 52.336 

30 | 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 
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Answers 


Prior knowledge 1 5 ald 0 1 2 3 
1 a 0.296 b 0.677 c 1.34x 10° ; ; ; : 
2 a k= ye b a P(D = d) 4 8 4 8 
3 x=3,y=-2,z=0 b 1.25 
2 ¢ 14=0.9375 
. 6 a P(T=1) = P(head) = 0.5, 
Exercise 1A P(T = 2) = P(tail, head) = 0.5 x 0.5 = 0.25, 
1 a EX) =4.6, EQ) = 26 P(T = 3) =1-P(T=1) - P(T = 2) = 0.25 
b E(X) = 0.3, EX) = 2.5 b E(T) = 1.75, Var(T) = +1 = 0.688. 
2 K(X) =4, EW) = 18.2 7 a E(X)=4a+20 
ee = - ; b a=0.375, 6 =0.25. 
7 1 1 1 . 
P(X = x) 2 3 6 Exercise 1C 
b E(X) =3, EX) =11 1 aly si 1 3 5 
c No 
PY=y) | 0.1 | 0.3 | 0.2 | 0.4 
4 alX 1 2 3 4 5 
- . , : , b E(Y)=2.8 
PUX=x) | 3 j 3 Te | ae c E(X) = 2.9 and 2E(X) - 3 = 5.8 — 3 = 2.8 = E(Y). 
b oe = 1.9375, E(X) = 5.1875 2 aly -g | 4 0 1 8 
Cc 10} 
5 a=03,b=-03 PY=y) | 0.1 | 0.1 | 0.2 | 0.4 | 0.2 
6 a=0.3,0=0.4,c=0.2 b E(Y)=1.1 
7 a=0.1,0=0.4 3 a 8 b 4 c 2 d 18 
8 Xx 1] 2 4 e 8 2 
+ : : : - : : 4 a 6 b -9 c -2 dl 
P(X = x) 8 8 8 8 20 20 e 9 > 
b X= number of 6s in 10 rolls, then X ~ B(10, ) om fue ai ete see 
P(X = 3) = 0.738 aa 
9 P=05 oa Se 
b Y=200+ 100X 
hall c E(Y) = £550 
oe a 7 726.5cm? 
E(X) = 37 8 a E(X)=1.25, Var(X) = 0.9375 


b E(Y)=4x1+2x24+4x4+ix8=3 


Exercise 1B E(Z)= 2E(X) + 3 29 


1 al b 2 
2 a E(x) =2= 1.83, Var(X) = 2 = 0.472 ce NASON A at 9 
b E(X) =0, Var(X) = 0.5 
= = Challenge 
= 4.5, =5. = E(X) — 2E(X)E(X)+ (E(X))2 = EW?) — EX 
4 a s | P(S=s) 
2 oa Exercise 1D 
1 a E(X)=2 b Var(¥)=2 c 1.414 
3 36 2 a E(xX)=2 b Var(¥)=4 c¢ E(X)=8 
4 3 3 a=01,b=04 
* 4 a -03<E(Y)<04 
5 36 b a=0.5, b= 0.2 
6 5 5 a 1=)P(¥=x)=2a+2b+¢ 
Bh 24=E(Y)=a+c+4b+9a=10a+4b4+c 
Gj + 0.4=P(Y>2)=a+b 
3 5 b a=0.1,0=0.3,c=0.2 
36 c P(2X¥4+3<Y)=P2<X%)=2a=0.2 
9 * 6 a E(X)=3.3 
10 zy b 1=y)P(¥=x)=3a+2b+¢ 
36 3.3 = E(X) = 6a + 9b + 6¢ 
11 a 0.6 = PY <-5) = PLY = 3) =a+2b4c 
; ¢ a=0.2,b=0.1,c=0.2 
12 36 d PW>5+Y)=P(X> 2)=06 
b 7 
c¢ 5.833 
d 2.415 
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Answers 


Mixed exercise 1 


1 


wh 


10 


11 


12 


13 
14 


a 


iw eseoner o & 


ao peaocmrne ads ce on ona 


ca 


a0 & 


a 


P(X = x) 


ie) tw wo tw te te 
No |Nlo] [4 [Se] S)r [85 


21 
D = 8.89 f 23 -108.3 


0.2 b 0.7 c 3.6 d 8.04 

0.3 b E(XY)=0x0.24+1x0.34+2x05=1.3 
0.61 d 0.5 

k+0+k+2k=1, 

so 4k=1,s0k=0.25 

E(X) = 2 

E(X’) = 0? x 0.25 + 12x 0+ 2? x 0.25 + 3? x 0.5 
=1+45=5.5 

6 

ee ee 
0.2854 

0.3 b 2.3 c 1.61 d 0.35 
1.46 f 0.281 

Discrete uniform distribution 

Any discrete distribution where all the probabilities 
are the same. 

2 

2 

p+q=0.5, 2p + 3q=1.3 

p=0.2,q=0.3. 

1.29 

5.16 


1 31 
3 b> 


Var(X) = E(X?) — E(x)? 


2 
= 128 _ (22) = Bt = 2.02 (3 s.£.) 


32 or 0.731 c or 3.077 
Var(X) = E(X’) — E(x)? 

270 /(40\" 
ae (3) ~ 0.92 (2 s.f) 
8.3(2s.f) 

7 b -4 c 81 d 81 
13 f 12 


e 
E(S) = 64, Var(S) = 225 


a 


b 


x 1 2 3 
P(X = x) 0.25 | 0.375 | 0.375 


2.125 c 0.609 d 5.25 e 5.48 


15 a 0.2 b 0.76 c 1.07 d 0.0844 
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16 a a=0.4,b0=0.2 
b_ E(X?) = 1.3, Var(X) = 0.81 
ce Var(Y) = 7.29 
d P(¥Y+2>X)=P(3X-14+2>X)=P(X* >-0.5)=0.9 
17 a 2a+ 264 c=1, 30+ 4c = 2.3, 2a+b0=0.4 
b a=0.15,b=0.1,c=0.5 
c P(-2X > 10Y) = P(X > 1) =0.75 
Challenge 
“i 1) n+l 
Rv) = SLE n(n +1) _ 
@) din 2n Zz 
E(X) = ye _nrn+1)(2n+1)_ (n+ 1)(2n + 1) 
mn 6n 6 
r4 
Var(X) = B09 - oxy? = S* DEN sD ae 
_ 4n?+6n+2-3n?-6n-3 _ (n+ 1)(n- 1) 
12 12 
Prior knowledge 2 
1 a 0.0168 b 0.001 c 0.3972 
2 a E(X)=3.8 b E(X*)=18 ce Var(X) = 3.56 
Exercise 2A 
1 a 0.2138 b 0.7127 c 0.4703 
2 a 0.1733 b 0.8153 ce 0.7531 
3. a 0.1323 b 0.3954 c 0.5429 
4 a 0.3626 b 0.5683 ce 0.1950 
5 A=3 
6 A=6 
Exercise 2B 
1 a 0.2017 b 0.4711 c 0.7211 
2 a 0.7798 b 0.6615 c 0.3035 
3 a 0.8641 b 0.6139 c 0.5368 
4 a 0.4679 b 0.3606 c 0.8200 
5 a 6 b 9 ec 5 d 5 
6 a 5 b 2 c All values of c > 6 
d All values of d > 8 


Exercise 2C 


1 


2 a 


ww 
ces 


NS oO 


spemenmasrrer rao 


10 a 
ll a 
12 a 


i 0.1680 ii 0.0839 
i 0.1606 ii 0.2851 


(1) Weeds grow independently 

(2) Weeds grow at a constant rate/unit of area 
0.1088 c 0.2084 

X ~ Po(2.5) 

(1) Faults occur independently 

(2) Faults occur at a constant rate 

0.2565 d 0.7586 e 0.8699 
i 0.1755 ii 0.0681 

i 0.7586 ii 0.8622 

0.0839 b 0.1512 

0.1247 b 0.0137 

i 0.1653 ii 0.1607 iii 0.2694 
0.4202 

i 0.1336 ii 0.4562 

0.7135 

i 0.5276 ii 0.1329 

0.5276 — breakdowns occur independently of each 
other 

0.1255 b 0.1512 c 0.1670 
0.1247 b 0.0260 

0.2650 b 11 minibuses 
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13 a 0.2971 
b Y~ Po(3). PX > 8) = 1 - P(X S 8) = 1 - 0.9962 
= 0.0038 = 0.38% 
c 10 
14 a_ 0.8088 b 0.1847 c 14 
Exercise 2D 
1 a 0.1606 b 0.7440 c 0.7149 
2 a 0.1465 b 0.2414 ce 0.2236 
3. a 0.0474 b 0.6159 c 0.3099 d 0.2851 
4 a 0.5049 b 0.3134 
5 a i0O.1213 ti 0.7166 
b_ Events occur at a constant average rate — the mean 
number of an interval is proportional to the length 
of the interval. 
6 a 0.2238 b 0.2707 ec 0.4579 
7 a 0.6988 b 0.3153 ce 0.1607 
8 a 0.2090 b 0.3374 c 0.4457 
9 a 0.0158 b 0.7534 c 0.2417 
10 a 0.2639 b 0.7657 ce 0.0754 
Challenge 
a Q~ Pod + p) 
Uti) 5 0 
pig=0)=-£ een 
(A+p)°=1 and(O)=1 
therefore P(Q = 0) = e"**" 
Q~ Pola + 1) 
+H) x (A + py) 
b po=1-© - Hy’) 


(At+p)'=(A+p)and1!=1 
therefore P(Q = 1)=e"*" x (A + pi) 


Exercise 2E 


1 a Mean = 1.43, Variance = 1.4251 
b Mean * variance 
ce Using4a=1.4, P=0.1128 
2 a Mean = 3.64, Variance = 3.5604 
b Mean ~ variance 
ce Using = 3.6, P(X S 2) = 0.3027 
d From the table relative frequency of obtaining no 
more than 2 cars per period is 0.29. Answer from c 
is very close to recorded value. 
3 a Mean = 2.867, Variance = 2.897 
b Mean * variance 
c Because the observed frequency for 8 or more flaws 
was 0. 
d 99 (using 4 = 2.9) 
Challenge 
Proof outline: 


A g-Agi Nai 
BX) = ) fix PX =a=) Xe 4 =o) ee 
i=0 i=0 . G21 = ° 


i+1 ~ i 
=e) a =e#x A> 4 etx Axeiaa 
. ! sarie i 
i=0 i=0 


Varl(X) = E(X?) — E°(X) 
Using a similar approach to above to gain: E(X*) =? +2 
Var(X) = (22 +A)- da? =A 


Exercise 2F 


1 a 84 b 2.52 

2a 8 b 0.1239 ce 0.3154 
3 0.4 0r 0.6 

4 0.2 or 0.8 


= 


anaes oa & 


Answers 


0.3 


uP 
iY 
x) 
I 
oO 
nS 


b 0.1643 

ii 0.9051 

ii 27.3 
b Mean = 69.26 Variance = 29.28 
; b_ E(X) =1.5, Var(X¥)=1.05 
Mean = 1, Variance = 0.8 
0.2 
164, 205, 102, 26, 3,0 
The values support the student’s suggestion that the 
data can be modelled by a binomial distribution 
Variance = 5 x 0.2 x 0.8 = 0.8 
The calculated variance matches the observed 
variance of the data and supports the use of a 
binomial distribution 


Challenge 


a 


P(¥=0)= 


| 
PUX= 1) = ( 
PIX = 2) = ( 


x p®x (1 - p)? =(1 - p)3 


x p? x (1 - p)! = 3p?(1 — p) 


NMNwWrRPwW Ow 


) x pt x (1 - p)? = 3p(l - p? 


P(X = 3) = (3) x p* x (1 - p) =p? 


E(X) = DXP(X = x) 


=(0 x (1 — p)*) + (1 x 3p(1 — p)2) + (2 x 3p2(1 — p)) 
+ (3 x p?) 
= 3p — 6p? + 3p? + 6p? - 6p? + 3p? = 3p 


E(X2) = (02 x (1 — p)3) + (12 x 3p — p)?) 

+ (22 x 3p2(1 — p)) + (32 x p?) 

= 3p — 6p? + 3p? + 12p2 — 12p2 + 9p? = 3p + 6p? 
Var(X) = E(X2) - E2(X) = 3p + 6p? - (3p)? = 3p(1 — p) 


Exercise 2G 


1 aii 0.1781 ii 0.1183 
b i 0.1755 ii 0.1247 
2 a i 0.1628 ii 0.1458 
b i 0.1606 ii 0.1512 
3 a i 0.1963 ii 0.2351 
b i 0.1954 ii 0.2381 
4 a 0.1075 b 0.1074 
c The two values are similar, so a Poisson distribution 
is a good approximation in this case. 
5 a 0.6472 b 0.2240 
6 a 0.0984 b 0.8743 
7 a 0.1422 b 0.3782 
8 a 0.1991 b 0.6472 
9 a X~B(10,0.05) b 0.0105 c 0.5298 
10 0.3498 
11 a X~ B(1200, 0.005) 
b Mean = 6, Variance = 5.97 
c 0.2851 
12 a 0.2378 b 0.0315 
13 a 0.7350 b 0.2788 
14 a 0.2068 b 0.1242 
15 a 0.0186 b 0.0863 
Mixed exercise 2 
1 a 0.4966 b 0.2700 c 0.2376 
2 a (1) Misprints occur independently 
(2) Misprints occur at a constant rate 
b 0.3425 c 0.1689 
3° AS5 


4 a (1) Emails arrive independently 


(2) Emails arrive at a constant rate 
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Answers 


b i 0.1377 
ii 0.2560 
5 a nis large, p is small 
b 0.4253 c 0.4335 d 1.93% 
6 Az=7 
7 a i 0.2627 ii 0.0582 
b 0.2560 
8 0.2022 
9 a i 0.1336 ii 0.4562 
b 0.2084 ce 0.2992 
10 a X~ Po(6), properties are sold independently and at 
a constant rate 
b 0.1606 ce 0.1090 or 0.1091 (using 
unrounded answer for part a) 
11 a 0.3848 b 0.1804 c 0.0440 
12 a 0.3285 b 0.1042 c 0.3134 
13 a X~B(150, 0.04) 
b 0.0174 c 0.9380 
14 a 0.0162 b Mean = 10, Variance = 9.8 
c 0.2202 
15 a 0.0230 b Mean = 3, Variance = 2.925 
ce 0.0335 
16 a 0.1321 b 0.7135 ce 0.3191 
17 a 0.1141 b 0.0103 
18 a X~ Po(4) Website visits occur independently of each 
other and at a constant average rate. 
b 0.0298 c 0.2834 
19 a Mean = 2.86, Variance = 2.867 
b Mean * variance 


ce Using 4 = 2.86, P = 0.4553 


Challenge 


a £ or 0.2461 (4 d.p.) 


b 755 or 0.0547 (4 d.p.) 


Prior knowledge 3 


1 a io.41171 ii 0.4159 iii 0.0565 

b i8 ii 4.8 

25 
2 a6 
Exercise 3A 
1 a 0.0347 b 0.6794 ce 0.5803 
2 a 0.0623 b 0.4565 c 0.4324 
3 a 0.0965 b 0.4213 c 0.4823 d 0.4984 
4 ai0147~ ii 0.3430 

b Attempts are independent; probability stays constant 
5 ait ii 0.0791 iii 0.6836 

b_ The probability is the same on each attempt 
6 a3 b 0.9085 
7 a 15 b 3 c 94 
8 a 0.0531 b 0.5905 
9 a 0.0315 b 0.4877 
10 a 0.0469 b 0.0156 
11 a 0.3712 b 0.0580 c 0.0244 
Exercise 3B 
1a b 20 
2 a 3 b 6 
3 a 0.0796 b 0.0429 

+ 20 + 140 

c i 7 or 1.538 ii G9 Or 0.8284 
4 az b 12 
5 a 4 b 5 

1 


7 a Geometric b Constant p; independent trials 
c 0.2 d 5 e 20 
8 ai® ii 3° b 0.0921 ¢ 0.3206 
d 0.6229 
9 a Constant p; independent trials 
b i 0.1056 ii 0.7744 
c E(X) = 2, Var(x) = 3° 
d 0.0086 e 0.0098 
10 a Poisson; faults occur independently and at random, 


long term average is constant. 


b 0.0474 ce 0.0354 
d_ E(X) = 21, Var(X) = 424 (both nearest whole number) 
e 0.1196 
Exercise 3C 
1 0.0659 
2 0.1668 
3 0.0552 
4 a 0.1406 b 0.0330 c 0.2816 d 0.4744 
5 a 0.1029 b 0.6496 ce 0.0953 d 0.2763 
6 a 0.1853 
b Games are independent and probability of success 
is same in each game. 
c 0.1838 d 0.2660 
7 a 0.1409 
b_ Trials are independent and probability of success is 
same in each trial. 
c 0.9806 d 0.1958 
8 a 0.0285 b 0.6474 c 0.0272 d 0.4629 
9 a 0.0515 b 0.4202 c 0.9993 d 0.0095 
10 a D~ Negative binomial (3, p) 
b 10.1148 ii 0.5323 iii 0.0647 
c The probability of success might change as she gets 
more practice. 
Challenge 


a P(¥Y<8)=1-P(Y> 8) 
P(Y > 8) is the probability of 2 or fewer successes in 
first 8 trials 
So P(Y S 8) = 1 - P(X S 2) where X ~ B(8, 0.4) 
So PY S 8) = 1 - Fs, 9.4(2) 
b P(Y<y)=1-P(Y>y) 
P(Y > y) is the probability of r- 1 or fewer successes in 
first y trials 
So P(Y S y)=1- P(¥ sr- 1) where X ~ Bly, p) 
So PY < 8)=1-F,,(r-1 


Exercise 3D 


15 45 
1 a > b vg 
40 40 
2 a 3 b 9 
3 ag b 0.1055 ec 24 
4 a 2 
b i 0.0280 ii 0.0250 
5 a 6 
b i 0.0669 ii 0.0498 
6 a 34 b 2° or 0.1975 
7 a Attempts are independent and probability of 
success is same in each attempt 
b ,1.75 @s.f) 
8 at b 0.0131 ¢ 96 
4 1575 125 
9 a x b = Cc d 0.4561 
10 a Negative binomial 
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b i 25 ii 3 

ce Probability of success is not constant 
50 

d 339 


Mixed exercise 3 


81 9 
1 a 77 0r 0.0791 b 7 
2 a Geo(0.1) b 10,90 ce 0.3138 
3. a Geometric b 0.0804 c 30 
d_ Throws are independent, probability is the same on 
each throw 
4 6 
5 a 0.001234 b 0.0012 
6 a 0.0655 b 0.0604 c¢ 25,10 d 2 
7 a 2 b 0.1003 ce 0.5941 d 0.1045 
8 a 0.65 x 0.35 = 0.2275 
b i 0.2786 ii 0.1741 iii 0.1717 iv 0.1811 
100 rr 700 
c Mean =~, Standard deviation = 75 
d 0.39 e 0.084 
f 0.009261 
Challenge 
1 a Negative binomial (2, p) 
b_ Both are geometric distributions 
c X=Y,+Y% 
- 1.1.2 
d EX) = EY) +E) =5 +5 =p 
2 X~ Negative binomial(r, p) 


Allocate random variables Y,,...,Y, with Y;,..., 
Y, ~ Geoip) such that X= Y, +... + Y, 


Then, EWX) = E(Y,) +... # BW) =F 4... +4 u 


Also, Var(X) = Var(Y,) + ... + Var(Y,) 


oleh. gilt? 


p? p° p* 


Prior knowledge 4 


1 
2 


3 


a 0.1563 b 0.4335 c 0.1079 d 4 
a 0.1052 b 0.8308 

c 270 (nearest whole number) 

x=8 


Exercise 4A 


a bwWN 


11 


12 


Reject Hy 

Reject Hy 

Fail to reject Hy 

Fail to reject Hy 

Reject Hy: Evidence suggests the mean number of 

misprints has increased. 

Reject Hy: Evidence suggests there is an increase in 

rate of accidents. 

Fail to reject H,: There is no evidence of increase in the 

rate at which the coffee machine seizes up. 

Fail to reject Hy: There is no evidence to suggest the 

rate of sales has changed. 

Fail to reject Hy: There is no evidence to suggest a 

reduction in the rate of accidents occurring at the 

crossroads. 

Fail to reject Hy: There is no evidence to suggest the 

average number of flaws has changed. 

a 0.1606 b 0.8472 

ce Fail to reject Hy: There is no evidence to suggest the 
mean number of breakdowns has decreased. 

Fail to reject Hy: There is no evidence to suggest 

reductions of times the doctor sees patients with the 

condition. 


13 


14 


Answers 


Fail to reject Hy: There is no evidence to suggest 
manager’s suspicion is correct. 


a i0.1251_ ii 0.2202 

b_ nlarge, p small 

c Fail to reject Hy: There is no evidence to suggest 
the servicing has reduced the number of defective 
components. 


Exercise 4B 


1 


NAT wD 


10 


11 


a Critical region: X < 1; Significance level = 0.0266 

b Critical region: X > 16; Significance level = 0.0082 

c Critical region: X > 9; Significance level = 0.0214 

Critical region: X > 16 

Critical region: X > 14 

Critical region: X < 2 

Critical region: X < 1 

Critical region: X > 13 

a Critical region: X = 0 or X = 9; 
Significance level = 0.0397 

b Critical region: X < 2 or X= 15; 
Significance level = 0.0311 

ce Critical region: X < 3 or X= 17 
Significance level = 0.0326 

a Critical region: X < 2 or X = 14 

b 0.0419 

ce X=11 not in critical region, hence no change in the 
rate. 

a Critical region: X < 1 or X = 10 

b 0.0722 

a e-mails arrive at random and at a constant average 
rate. 

b Critical region: X < 3 or X= 16 

c 0.0432 

d X=13 not in critical region hence no evidence to 
suggest the mean rate is different to 9. 

a c=8 b 0.0311 


Exercise 4C 


11 


12 


Not significant. Fail to reject Ho. 

Significant. Reject Hp. 

Significant. Reject Hp. 

Not significant. Fail to reject Hy. 

Significant. Reject Hp. 

Reject Ho. There is evidence that the probability of 

getting a 6 is less than #. 

Reject Hy. There is evidence that the probability of 

getting an A is less than +. 

a X~ Geo(}) 

b 0.0791 

c Fail to reject Hy. There is no evidence to suggest the 
probability of Lucy scoring a goal from a free kick is 
now less than $. 

Reject Hy. There is evidence the student’s suspicion is 
correct. 

Reject Hy. There is evidence that Wisetalk are over- 

stating their percentage. 

H,: p = 0.3, Hy: p < 0.3. 

Reject Hy. There is evidence to suggest rival’s claim is 

correct. 

a X~ Geo(2). 
Fixed probability of seeing a robin. The probability 
of seeing a robin on one day is independent of the 
probability of seeing a robin on another day. 

b i 0.1157 ii 0.4823 

c Reject Ho. There is evidence to suggest Imelda is 
over-stating the probability. 
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Answers 


Exercise 4D 


1 a Critical region: X = 10 b 0.0404 
2 a Critical region: X = 8 b 0.0490 
3 a Critical region: X < 2 b 0.0975 
4 a Critical region: X = 13 b 0.0434 
ec As X=11 is not in the critical region, we do not 


reject Ho. 
5 a Critical region: X > 9 
6 Critical region: X > 5 


b 0.0390 


Challenge 


a Critical region: X < 3 or X = 409 b 0.0518 


ce X=5 is not in the critical region, hence do not reject Ho, 


Mixed exercise 4 
1 a 0.1575 b 0.3272 
c Reject Ho. There is evidence to suggest that the 
number of vehicles has reduced. 
2 Fail to reject Ho. There is no evidence to suggest a 
decrease in the number of deformed red blood cells. 
3 Fail to reject Hj. There isno evidence to suggest that 
the crosswords are more difficult. 
4 a 0.3025 
b_ Reject Hy. There is evidence to suggest that the 
meteorologist is correct. 
5 a 0.1708 b 0.3423 
c Fail to reject Hy. There is no evidence to suggest 
that Waldo has decreased the rate. 


6 a 0.1552 b 0.1424 
c 0.1031 or 0.1032 (using unrounded answer from 
part a) 
d X22 e 0.0296 


7 Fail to reject Ho. There isno evidence to suggest an 
increase in sales 
8 a Fail to reject Hy. There is no evidence to suggest the 
rate of visits is greater on a Saturday. 
b X=15 
9 Fail to reject Hy. There is no evidence to suggest the 
percentage is higher than the manager thinks. 
10 a 0.0783 b 0.7225 c X2=20 d 0.0456 
11 a Ho: 4=4,H1>4 b X=9 
c As X= 8 is not in the critical region, the scientist’s 
suggestion is rejected. 
12 a X<8 b Y=7 
c The probability that Alison has incorrectly rejected 
Hy is 0.0081 and Paul is 0.0156. 


Challenge 

a Negative binomial, successive trials each with the same 
probability of success where p is the number of trials 
needed for r successes. 


b Critical region: X <5 c 0.0426 


Prior knowledge 5 
1 a P(X> 115) = 1 -0.2660 = 0.7340 
b P(120 <X < 130) = 0.3944 


c a=114.60 
2 a -9.5 b 18 c 2 
3 0.2744 


Exercise 5A 
1 a 0.0072 
b Sample taken from a population that was normally 
distributed, so answer is not an approximation. 
2 a 0.2525 b 0.0098 ~ 0.0096 
3. a 0.0668 or 0.066807 b n= 241 


0.1855 

a 0.0416 

0.1103 

a k=0.15 b 0.1727 

c Answer is an approximation, n is large, so fairly 
accurate. 

Need n at least 1936 

a Salaries are unlikely to be symetrically distributed 
so normal distribution would not be a good model. 

b i 0.0231 ii 0.7804 

c Estimate likely to be inaccurate, small sample size 
and unknown if original distribution was normal. 

10 96 


b 0.0130 


ND OS 


Noe <} 


Exercise 5B 


1 a 0.2084 

b_ 0.1807, this estimate is inaccurate, sample size not 
big enough 

2 a EX) =4, Var(X) = 12 
b 0.1587 

3 0.9214 

4 a 10 b 0.0786 

5 a 0.1680 b 0.1587 

6 a 5 b 0.3085 

7 a 0.0352 b 0.9981 

8 a 0.0019 b 0.0416 

Mixed exercise 5 

1 0.0228 

2 0.1855 

3. ne=3 

4 0.1030 

5 a 0.1804 b 0.4191 

6 a 0.25 b 0.1855 

7 a 0.9171 
b Sample taken from a population that was normally 

distributed, so answer is not an estimate. 

c 0.7009 

8 0.8944 

9 n=7 

10 0.0014 

11 a/x 0 1 E(X) = 0.6 

P(X = x) 0.4 0.6 Var(X) = 0.24 

b 0.1709 
e n> 1025 

Challenge 


X,+...+X, ~ Mnp, no”) and so 

= 2 2 

RR + Xd NAS) = Mu Z) 
n n 


Review exercise 1 
P(X = x) 


1a x 


wil, ] w w w w w 
SIZ] So] Sl~] Ble] Sle] Sl- 


1 
2 
3 
4 
5 
6 
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7 161 


12 0.583 c 36 or 4.47 
»_ 791 25921 
Var(X) = E(x’) — (E(X))? = 36 1296 * 1.97 (3s.f.) 
17.7 
1 64 
17 or 0.0588 b 17 or 3.76 
Var(X) = B(X2) - (E(x)? = 266 - 4096 _ 1 47 (35.6) 
17-289 
13.3 
p+q=0.4, 2p+4q=1.3 
p=0.15,q=0.25 
1.75 
7.00 
p+q=0.45, 3p + 7q = 1.95 
p=0.3,g=0.15 
0.35 d 7.15 el f 114.4 
01,04 b 1.5,1.41 ¢ 12.69 d 0.4 
x -3 -2 0 1 3 
P(X¥=x) | 0.1 | 0.2 | 0.2 | O.1 | 0.4 
0.9 
X ~ Po(1.5) b 0.251 (3 s.f.) 
0.469 (3 s.f.) d 0.185 (3 s.f) 
Events occur at a constant rate. 
Events occur independently or randomly. 
Events occur singly. 
i 0.134 (3 s.f.) ii 0.715 (3 sf.) 
0.149 (3 s.f.) 
0.0816 b 0.1931 c 0.5673 
1.45; 1.4075 


Mean # Variance 

P(X > 2) = 0.4253 

If X ~ Bin, p) and 

nis large 

p is small 

then X can be approximated by Po(np). 

0.0001 

0.00098 

mean = np = 10 

variance = np(1 -— p) = 9.9 

0.870 (3 s.f.) 

0.226 (3 s.f.) 

1 =3,if X ~ Bin, p) and n is large p is small then X 

can be approximated by Po(np). 

X ~ B(200, 0.015) 

0.1693 

X ~ B(n, p) and n is large p is small then X can be 

approximated by Po(np). 

A= 3. 

P(X = 4) = 0.1680 

% error = 0.77 

Geo(0.05) b 20, 380 c 0.4877 

Geometric b 0.0531 e 90 

Attempts are independent and probability of 

success is constant. 

0.5811 b 16 spins 

0.0529 

Games are independent and the probability of 

winning is constant. 

22.2; 10.1 (both 3 s.f.) 

0.15 

0.3 b 0.0889 c 0.0467 

i An hypothesis test is a mathematical procedure 
to examinea value of a population parameter 
proposed by the null hypothesis Hy, compared to 
the alternative hypothesis H,. 


20 a 


21 


anes 


22 a 


Answers 


ii The critical region is the range of values of a test 
statistic that would lead you to reject Ho. 

2 < 0.45 critical region is 3 or fewer (X S 3) calls in 

the 20-minute period. 

A > 0.45 critical region is 16 or more (X = 16) calls 

in the 20-minute period. 

4.33% 

P(X < 1) = 0.0611 > 0.05 so result is not significant, 

do not reject Hp. 

P(X = 11) = 0.138 > 0.05 so result is not significant, 

do not reject Hp. 

Would reject Hy at the 15% significance level. 

16 or fewer, 33 or more 

10.3% 

18 does not lie in critical region so there is no 

evidence at the given level of significance that the 

mean rate of orders is different from that claimed. 

X ~ Geo(0.2) 

0.0440 < 0.05 so there is evidence that Mr Taylor’s 

suspicion valid. 


23 Hy: p= 0.5 Hi: p < 0.5 
Assume Hp, so that X ~ Geo(0.5) 
Significance level 10% 
P(X = 5) = (1 - 0.5)* = (0.5)*= 0.0625 
0.0625 < 0.1 
There is sufficient evidence to reject Hy, and conclude 
that Xander is overestimating his shooting accuracy. 


24 a X<102 
b 115 is not in the critical region so there is no 
evidence to doubt Brian’s claim. 
25 a N(90, 0.25) 
Application of central limit theorem (as sample 
large) 
b 0.0228 
26 0.4875 
27 a 0.1 
b 0.5539 
ce Accurate since n is large. 
28 a 0.0699 b 0.0810 
c 0.0787; values are close so the central limit 
theorem does provides a reasonable estimate. 
29 0.2593 
30 a 0.0443 b 20 c 0.3325 
Challenge 
1 al/x 0 1 2 3 
P(X = 4) cs 32 a3 2 
b E(X)=UaxPX=x)=Oxt+1x f+ 2x 243 
Xa e 


30a 
b 


Each term is (7) pg 

E(X) = SxP(¥ = x) = 0g" + 4q%p + 12q?p? + 12qp? 
+4p*,q=1-p 

=> 4p - 12p? + 12p? - 4p4 + 12p? — 24p? + 12p* 
+ 12p? - 12p* + 4p* = 4p 

Var(X) = E(X*) — (E(X))* = 0q* + 4q?p + 24q?p? 
+ 36qp? + 16 p*- 16p?,q=1-p 

=> 4p - 12p? + 12p? - 4p4 + 24p? — 48p? + 24p* 
+ 36p* — 36p* + 16p* — 16p? 

= 4p — 4p* = 4p(1 - p) 

X<10 

2.39% 
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Prior knowledge 6 


1 
2. 
3 


0.4043 

0.132 

Hy: p = 0.6, H,: p # 0.6 
Here n is large and p is near 0.5 so a normal 
approximation can be used. 
B(100, 0.6) ~ N(60, 24) 


PLY = 70) = PW > 69.5) = v(z > oe) 


V24 
= P(Z > 1.94) = 0.026 


0.025 > 0.025 , therefore we fail to reject Hy. There is 
no evidence that David is wrong. 


Exercise 6A 


1 


H,: There is no difference between the observed and 

expected distributions. 

H,: There is a difference between the observed and 

expected distributions. 

a H,: The observed distribution is the same as the 
discrete uniform distribution. 
H,: The observed distribution is not the discrete 
uniform distribution. 

b X°=1.6 

a Hy): There is no difference between the observed 
distribution and the discrete uniform distribution. 
H,: There is a difference between the observed 
distribution and the discrete uniform distribution. 

b 150 ce X?= 14.33 


a | Mutation present Yes No 


Expected frequency | 120 | 40 


b Hy): There is no difference between the observed 
and expected distributions. 
H,: There is a difference between the observed and 
expected distributions. 


ec X?=0.3 

a | Result H T 
Expected frequency for fair coin 25 | 25 
Expected frequency for biased coin | 30 | 20 


b XX}, = 0.72, Xf, = 0.33 

c Since a lower goodness of fit score is better, it is 
more likely John was flipping the biased coin. 

Goodness of fit for Welsh men: 2.074, goodness of fit 

for Welsh women: 4.076. Therefore the distribution for 

English adults is a closer match for Welsh men than 

women. 


Exercise 6B 


1 6 degrees of freedom (7 observations, 1 constraint) 

2 11.070 

3. a 11.070 b 20.090 c 15.987 

4 18.307 

5 13.362 

6 1.646 

7 1.145 

8 a 5.226 b 21.026 

9 a Combine x =4 and x= 5 cells intoax = 4 cell, 
so that expected value is 6.25 > 5. 

b With 4 cells there are 3 degrees of freedom 

(4 observations, 1 constraint), and so y3is a 
suitable distribution to model the goodness of fit. 
We have x (1%) = 11.345. 
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Exercise 6C 


1 


We calculate X* = 4.33.There are 5 degrees of freedom. 
x2 (5%) = 11.070, so there is insufficient evidence to 
reject the null hypothesis. 

We expect 24 winning tickets and 96 losing. 

X? = 4.21875, whereas yj (5%) = 3.841, so we reject 

the null hypothesis, the tombola is unfair. 

X? = 8.05, whereas x3 (2.5%) = 7.378, so we reject the 

null hypothesis, the expected distribution does not fit 

the data. 

a Group together observations for ‘4 dogs’, ‘5 dogs’ 
and ‘>5 dogs’, so that the expected frequency 
exceeds 5. There are then 5 observations, and 1 
constraint, so 4 degrees of freedom. 

b X? = 12.236, whereas x4 (5%) = 9.488. Therefore 
we reject the null hypothesis, expected distribution 
doesn’t fit the data. 

X? = 737.6, whereas \2 (5%) = 11.071, so we reject the 

null hypothesis that the distribution from 2000 is a 

good model for the data from 2015. 


Exercise 6D 


1 


a H,: the data may be modelled by Po(2). 
x 2(5%) = 11070, 
X? = 4.10 
No reason to reject Ho. 
b Reduction by 1 
Expected values 17, H,: deliveries are uniformly 
distributed. 
x2 (5%) = 11.070, X? = 5.765 
No reason to reject Ho, 
a 1.4 
b 3 (10%) = 4.605, X? = 5.04 
Reject Ho. 
These data do not come from a Poisson distribution 
with 4 = 1.4. 
a 0.4 
b xi (5%) = 5.991, X? = 3.19 
No reason to reject Hy 
Expected values: 21.6, 16.2, 27, 5.4, 10.8 
x7 (5%) = 9.488, X? = 1.84 
No reason to reject Ho. 
The number of accidents might well be constant at 
each factory. 
A= 3.45, x{5%) = 9.488, X? = 0.990 
No reason to reject Ho. 
There is not sufficient evidence to suggest the data are 
not modelled by Po(3.45). 
a Breakdowns are independent of each other, occur 
singly at random and at a constant rate. 
b A=0.95, H,: the data can be modelled by Po(0.95) 
Expected values; 38.67, 36.74, 17.45, 7.14 
x3 (5%) = 5.991, X* = 16.04. 
Reject Hy. The breakdowns are not modelled by 
Po(0.95). 
Ho: prizes are uniformly distributed 
H,: prizes are not uniformly distributed 
x5 (5%) = 16.919, X? = 10.74 
Do not reject Hp. There is no reason to believe the 
distribution of prizes is not uniform. 
a R= 43.75 S = 54.69 T = 43.75 
b_ H): A binomial model is a suitable model 
H,: A binomial model is not a suitable model 
x? < c.v. so accept Hp. 
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10 


11 


Conclude no reason to doubt data are from 
B(8, 0.5). 

c Mean would have to be calculated, an extra 

restriction. 

c.v. would be x2 (5%) = 11.070. 

y? < c.v. so no change in conclusion. 

Unbiased estimator of variance = 2.4 

Mean is close to variance. 

$= 27.2 t=78.4 

H,: the data are from Po(2.4) 

H,: the data aren’t from Po(2.4) 

3.5 

This expected frequency of 3.5 < 5 so must be 

combined with E(X = 6) to give class ‘6 or more goals’ 

which now has expected frequency 7.2 + 3.5 = 10.7 

We now have 7 classes after pooling and 

2 restrictions so degrees of freedom = 7 —- 2=5 

g x?7=15.7 c.v. = 11.070 
x? > c.v. so reject Hy. 

Conclude there is evidence that the data can not be 
modelled by Po(2.4). 

a #3 =2.59(2d.p.) 

b_ Itis assumed that plants occur at a constant average 
rate and occur independently and at random in the 
meadow. 

ce s=37.24 (2 dp.) 

d x’? <c.v. so accept Hy. 
Conclude there is no reason to doubt the data can 
be modelled by Po(2.59). 


— 


= Oo 


t = 2.50 (2 dp.) 


Exercise 6E 


1 
2 


v = 2, y3 (5%) = 5.991 

Hy: Ownership is not related to locality 

H,: Ownership is related to locality 

x3 (5%) = 5.991, X2 = 13.1 

Reject Ho. 

a (3-)G-)=4 

b 3 (5%) = 9.488. 
Reject Hy. There is an association between groups 
and grades. 

Hp: There is no relationship between results 

x2 (5%) = 9.488, X? = 8.56 

Do not reject Ho. There is no reason to believe there is 

a relationship between results 

x3 (5%) = 5.991, X? = 1.757 

Do not reject Ho. There is no evidence to suggest 

association between station and lateness. 

x3 1%) = 13.277 

Reject Hy. Gender and grade appear to be associated. 


a Observed Expected 
[ 4 | B [total A |B 
OK | 47 | 28 | 75 45 | 30 
Defective | 13 | 12 | 25 15 | 10 
Total | 60 | 40 | 100 ao Wl oe 


b_ H): Factory and quality are not associated. 
H,: Factory and quality are associated. 
x? (0.05) = 3.841, 
(0, - E;? Oe Og Og 
Ss a Ss 6 oe moO 
0.8888 < 3.841 
Do not reject Ho. There is no evidence between 
factory involved and quality. 


10 


11 


12 


Answers 


Hp): Gender and susceptibility to flu are not associated. 
H,: Gender and susceptibility to flu are associated. 


Observed Expected 
Boys | Girls | Total Boys | Girls 
Flu 15 8 23 10.12] 12.88 
No flu 7 20 27 11.88 | 15.12 
Total 22 28 50 = 7 


x? (5%) = 3.841, X2 = 7.78 

Reject Hy. There is evidence for an association between 

gender and susceptibility to influenza. 

x3 (5%) = 5.991, X? = 27.27 

Reject Hy. There is evidence of an association between 

the gender of an organism and the beach on which it 

is found. 

Ho: There is no association between age and number of 

credit cards. 

H,: There is an association between age and number of 

credit cards. 

x? (5%) = 3.841, X? = 8.31 

Reject Ho. There is an association between age and the 

number of credit cards possessed. 

a H,: There is no association between gym and 
whether or not a member got injured. 
H,: There is an association between gym and 
whether or not a member gets injured. 


34 
b 2% x 175 = 6.88 (2 dp. 
865 pap: 
c Expected frequencies are 
Gym A B C D 
Expected | g 3 | 1014 | 6.88 | 7.66 
injured 
Expected | 557 6g | 247.86 | 168.12 | 187.34 
uninjured 


From which we calculate X? = 7.732 
There are 3 degrees of freedom and v3, so we do 
not reject the null hypothesis, there is not enough 
evidence at the 5% significance level to think that 
any gym is more dangerous than the others. 

a 4H): There is no association between science studied 


and pay. 

H,: There is an association between science studied 

and pay. 

b_ The expected frequencies are 

Science Salary 
studied £0-£40k | £40k-£60k] >£60k 
Biology 70.19 26.40 7.41 
Chemistry 72.89 27.42 7.69 
Physics 74.92 28.18 7.90 


From which we calculate X? = 2.031. 

The number of degrees of freedom is 

(3 - 1) x (3 - 1) =4 and yj (5%) = 9.49, therefore 
we do not reject the null hypothesis, there is not 
enough evidence to at the 5% significance level to 
think that the subject studied has an effect on pay. 
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Exercise 6F 
1 The expected frequencies are (grouping to ensure each 
expected frequency is at least 5) 


i 1 2 3 4 | 25 
Eepeciot | ap-| 72 | 28:8'| 11.52'| 7.68 
frequency 


From which we calculate _X? = 14.705. 


x7 (1%) = 13.277 therefore we reject the null hypothesis. 


2 The expected frequencies are 
k 1 2 3 4 5 =6 


Expected 
frequency 


From this we calculate X? = 7.966. 
x2 (5%) = 11.071, so we do not reject the null 
hypothesis. Geo(0.4) is a good model for the data. 


40 | 24 | 14.4] 8.64] 5.184 | 7.776 


1 1 = 
3 a We have a aa x O, = 1.62, so p = 0.617. 
b_ The expected frequencies are 
k 1 2 3 24 
Expected | 61.7 | 23.63| 9.05 | 5.62 
frequency 


From these we calculate X* = 0.901. There are 2 
degrees of freedom (since we estimated p from the 
data) and y3 (5%) = 5.991, so we do not reject the 
null hypothesis. Geo(0.617) is a good model for the 


data. 
1 1 
4 h ——~ = 1.35, = 0.741. 

a We ave 5 00 Dk x so p 
b The expected frequencies are 

k 1 2 S33 

Expected | 741 |19.19| 6.71 

frequency 


Where we have grouped the observations for 3 and 
above to ensure the expected values are all at least 
5. From this we can calculate X? = 0.312. There is 
only one degree of freedom, and yj (2.5%) = 5.024. 
Therefore we do not reject the null hypothesis, 
Geo(0.741) is a good model for the data. 

5 a Assuming consecutive letters are independent 
and equally likely, we could model the number of 
characters until the next vowel using a Geo(=) 


distribution. 

b_ The expected frequencies are 
k 1 2 3 4 5 =6 
Expected | 44 42/11.65/9.41|7.60| 6.14| 25.78 
frequency 


From this we calculate X* = 2.418. Here are 5 
degrees of freedom, and x2 (5%) = 11.071, so we 
do not reject the null hypothesis, the data could be 
modelled by a Geol) distribution. 

c_ e.g. Experiment does not tell us anything about the 
distribution within either the consonants or the vowels 


Challenge 

The number of people is the number of trials until 10 
successes with fixed probability of success, so a negative 
binomial distribution is a natural choice of model. 

Hp: The data can be modelled by a negative binomial 
distribution 

H,: The data can not be modelled by a negative binomial 
distribution 


We estimate the parameter p from the data 
1X 1 1 

=e t=— x S122 

pT 10 * Tog aE * % ss 

so p = 0.814. 

Using this, we calculate the expected frequencies 


Number of 10 11 12 13 14 | 315 
people 

Expected | 43 9 | 24.71 | 25.27 | 18.80 | 11.37 | 10.57 
frequency 


From which we can calculate X? = 3.32. Since y7 (5%) 

= 9.49 (4 degrees of freedom, 6 observations and 2 
constraints, i.e. 1 parameter estimation and 1 total) there 
is not enough evidence to reject the null hypothesis. We 
can model the data with a negative binomial distribution. 


Mixed exercise 6 


1 23.209 

2 15.507 

3 v=8, critical region x? > 15.507 

4 v=6,12.592 

5 Hp: Taking drug and catching a cold are independent 


(not associated) 
H,: Taking drug and catching a cold are not 
independent (associated) 


SOB 253 


vel x3 (5%) = 3.841 > 2.53 
No reason to believe that the chance of catching a cold 
is affected by taking the new drug 
6 4H): Poisson distribution is a suitable model 
H,: Poisson distribution is not a suitable model 
From these data / = 32 = 0.65 


Expected frequencies 41.76, 27.15, 


8.82, 2.27 
11.09 
a=0.05,v=3-1-1=1; critical value = 3.841 


(0- EP | 
y, E = 1.312 


Since 1.312 is not the critical region there is insufficient 
evidence to reject Hy and we can conclude that the 
Poisson model is a suitable one. 
7 27.5, 22.5; 27.5, 22.5 
(O- EF)? _ (23 - 27.5)? (18 - 22.5)? 
oz ER ops og 
a = 0.10 > x? > 2.705 
3.27 > 2.705 
Since 3.27 is in the critical region there is evidence of 
association between gender and test result. 
8 a Each box has an equal chance of being opened — we 
would expect each box to be opened 20 times. 
b 3 (5%) = 9.488, y2 = 2.3 
No reason to reject Ho, A discrete uniform 
distribution could be a good model. 
9 a 0.72 
b x3 (5%) = 5.991, X? = 2.62 
No reason to reject Hy. The B(5, 0.72) could be a 
good model. 
10 14= 0.654, v = 2, X* = 21.506, 
x3 (5%) = 5.991. Reject Ho. 
Po(0.654) distribution is not a suitable model. 
11 3 (5%) = 5.991, X? = 4.74... 
12 a 4.28 
b yi (5%) = 9.488, X? = 1.18 
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No reason to reject Hp. Po(4.28) could be a good From which we calculate X? = 14.914. There are four 
model. degrees of freedom and yj (5%) = 9.488, so we reject 
13 y7 (5%) = 3.841. X* = 10.42. Reject Ho. the null hypothesis. The evidence suggests Wilfred 
There is evidence to suggest association between left- may not be equally likely to pick any of the numbers 
handedness and gender in this population. 2,5 0r 7. 
14 a H,: There is no association between gender and 
preferred subject. Challenge 
H,: There is an association between gender and a_ Using the midpoints of each range, we calculate the mean 
preferred subject. and variance. Mean = 14.14, Var = 17.11 
b (28 + 40) x (45 + 40 + 45) ~ 29,47 b We calculate expected frequencies as follows 
alt Length of | /<5 |5<i<] 10<2| 15</ | 20<! 
c The expected frequencies are call (1) 10 <15 < 20 
alah Expected | ¢ 7g | 72.44 | 211.95 | 169.68 | 39.14 
Physics | Biology | Chemistry frequency 
Male 67.43 38.53 64.03 The null and alternative hypotheses are: 
Gender Hp: The call length can be modelled by a normal distribution 
Female [51.57 | 29.47 | 48.97 H,: The call length can’t be modelled by a normal distribution 
From which we calculate X2 = 8.685. From this we can calculate X? = 3.242. There are 2 
d_ There are (3 - 1) x (2 - 1) = 2 degrees of freedom, degrees of freedom, since there are 5 observations, we 
and y? (1%) = 9.21. Therefore we do not reject the estimated 2 parameters, and we have the constraint of 
hypothesis. the total summing to 500. Thus the critical value is \3 
e Since y3 (5%) = 5.991, we would reject the null (5%) = 5.991. Hence we do not reject the null hypothesis, 
hypothesis at the 5% significance level. the length of call can be modelled by a normal 
15 a i P(¥=1)~ 0.2504 distribution. 
ii PY> 2) 20.3639 
b Sk x O,= Gp = 2.15 Prior knowledge check 7 
€ a= 16.15, 6 = 1.36 1 a 0.1708 b 0.3423 
d H,: The data can be modelled by a Poisson c 45 d 4.5 
distribution with mean 2.15. 2 a 0.0720 b 0.49 
H,: The data cannot be modelled by a Poisson ¢ 10 aq 70 
distribution with mean 2.15. 3 
e The chi-squared test is not very effective if the 3 a es b eee 
expected values are below 5, so we combine aaa d “75 
expected values in order to make sure each of the 4 a 4e*(3 + e*)* b 2e?*(1 - 2x? - 2x) 


expected values are at least 5. 
f The test statistic is X* = 2.507. There are 3 degrees 
of freedom (5 observations, and 2 constraints since 


Exercise 7A 


we estimated the mean from the data). From the loa 0, 1,2 . 
tables we see y3 (5%) = 7.815. Therefore we do not b i 0.3 ii 1 
reject the null hypothesis. 2 a 0, ri 2,3 a 
16aA asad distribution would be a good choice. big og 
a 524 mw 255 3 a O b 0.8 c 0.2 
» We have 5% as Dk * O1 = 355" 80D Sa4° 4 t+ P+ Bs tte + 09) 
c The ex a values are 2 
P 5 5b + Ht? + #3 + it 
Number 1 9 3 | 4 5 | +6 6 a Ate+ 2e? + 3t + 4t49 b A(t + 40? + 92°) 
of calls Fi Fi 
Expected i 8 46 4 
xpectec’ |124.09|63.70|32.70| 16.79 | 8.62 | 9.09 8 at 
frequency 25 
From which we calculate X? = 14.84. There are 4 b |x 0 1 2 3 4 
degrees of freedom and y3(5%) = 9.488, so we reject P(X =x) 1 8 4 
the null hypothesis, the data is not well modelled by 25 | 25 | 25 | 25 | 25 
a geometric distribution. 
17 a e.g. It is unlikely that each guess will be J a |x 2 3 4 3 6 7 8 
independent of the others. P(X = x) 1 Z 3 7 3 7 i 
b_ If Wilfred is equally likely to select the digits 2, 5 2 oo 
or 7, then he has a 4 chance of getting the right b 7(t? + 207 + 304 + 40° + 30° + 207 + #8) 
number each time he calls his parents. So the 10 G,1)# : 
number of calls can be modelled by a Geo(;) da k= b 10: c £ d@ Y~B10.0.5 
distribution. The expected values are 1024 To2t 256 eee) 
Attempts 1 2 3 4 S5 Exercise 7B 
4 6 5 
rae 17.33111.56| 7.7 | 5.14 |10.27 1 a (0.5+0.5t) b (0.8 + 0.20) ce (0.1 + 0.90) 
requency d ee) e elev f  e020-D 
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Answers 


0.3t 
1-0.7¢ ’ 


| 0.9¢ 
1-0.1¢ 


(0.8+0.20)° b 


0.8 
1— 0.2% . | 


0.4¢ ) 


1-0.6¢ 


30a 


0.2t 
10.8% = | 


0.2222 c 
0.0406 


0.2t ) 
1 —0.8¢ 


e931 


0.35¢ 
1-0.65¢ 


4 a X~ Po(0.3) 
a Geo(0.35) b 


6 X~B(4,0.8) 
The probability distribution of X: 
0.24, 4 x 0.8 x 0.23, 6 x 0.8? x 0.22,4 x 0.8% x 0.2, 0.84 
G,( = LeP(X = x) 
G0 = 0.24+ 4x 0.8 x 0.23¢ + 6 x 0.8? x 0.277 + 4 
x 0.83 x 0.20? + 0.844 
GO = (0.2 + 0.8¢)4 
7 X~ Pol3.5) 


-3.5 x 
Pv=4-8% 3.5 
x! 


= ePX =x) = re is 22") 


x! 
x 2 3 
-osy B.Sl ~ors(1 43.54 250 oe . 


The Maclaurin expansion of e* with « = 3.5¢ 
= E735 x¢ 3-5 = Q3.51t-1) 


8 Y~ Geol(0.7) 
P(Y = y) = 0.31 x 0.7 
Gyo) =  wPy = y) = ev x 0.3% x 0.7 
= 0.70 + 0.3 x 0.70? + 0.3? x 0.703 + 0.3% x 0.704 + 
= 0.7¢(1 + 0.30 + (0.30)? + (0.30)° + ...) 
The infinite geometric sum with first term 1 and 
common ratio 0.3¢ 


= 0.74 : |= 
1-0.3¢ 
9 X~ Bi, p) 


P(X = x) = (7) x p* x (1 — p)™ 
Guo) = ne x)= a x is x p* x (1 = py 
= D(2) * (ey x (1 - py 


= ((¢) «14° x (1 = PP) + (({) x wa! «A - py) 
+[() «01 «0-04 


Binomial expansion of (a + 6)" with a=1-pandb=pt 
Hence Gy) = (1 - p + pl)” 
10 X~ Pol) 
e7 x At 
PIX =a) = 
x! 
Go = DePx=a)=Te( 4) - 


Qo? ‘ Qo? ‘ 
“2h 0 Br 
The Maclaurin expansion of e* with x = At 


Hence Git) = e~ x e* 
= etl) 


ig 


0.7¢ 
1-0.3¢ 


a fo 


x! 


=e“ 14+at+ 


11 Y~ Geowp) 
PY =y)=(1 - pl!" xp 
Git) = CwP(Y = y) = te’ x (1 - pp)’ xp 
=pt+(1—-p)~x pt? +(1-p)? x pt + (1 -p)x ptt + 
= pt(1 + (1 — p)t + ((1 — p)e)? + ((1 — pt) + ...) 
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The infinite geometric sum with first term 1 and 
common ratio (1 - p)t 


Hence Gy) = 


ape) Ste 


Exercise 7C 


1 


2 
3 
4 


Oo 


10 


11 


= 
ho 


= 
Bm 


3, 11 
2 16 


; 41( = 1.067) (3 d.p.) 
(0.5 + 0.50) b 34 
(0.4 + 0.6t)* 


B 
B 


ii 2( = 0.980) 3s.) 


3 


Sf ols BB ol 
a 
No 
uN 


9 
2, 22(— 0.943) (3 s.f) 
1 


9 
16 
e «9 
N32 
a 2;4 
- 1 ue oes 
ig ii = iii O 
iv i 
2e 1 
i cH 
a X~ Geol) b 
1-36 
c i6 ii 30 
X~ Bn, p) 
Hence has a pgf: G,(é) = (1 - p + pt)” 


Gy = np(l — p — pty" 

= Gy(1) = np(1 - ‘ + Lae =np 
Varl(X) = G1) + GY) - (GY)? 
GY = p’n(n — D1 — p + pay? 
GYD = p’ntn - 1) 
Substitute into equation for Var(X) 


Var(X) = p?n(n — 1) + np — (np)? = p?n? — p?n + np — n°p? 
= np — np* = np(1 - p) 
X ~ Pola) 
Hence the pef is Gxt) = e4-? 
EX) = Gi.) 
Gl d= Lett) 
Gy) = en =A 
VartX) = GY) + Gy) - (G0)? 
Gul D= A2e%! t-1) 
GYD) = 22e° = 1? 
Substitute into equation for Var(X): 
Var(X) = 22 +4-(y2 =A 
a X~ P(4) b et) 
ec i EX)=G\) 
GLO = Aeteh 
By substituting ¢ = 1 into G\@: E(X) = 4e° = 4 
ii Var(X) = GY) + GY) — (G4)? 
GY = 16e4*) 
By substituting ¢ = 1 into G\@: G\(1) = 16e° = 16 
Hence, Var(X) = 16 + 4 - (4)? =4 
standard deviation = \ Var (X) = 
Boe 
a=4,b=5 
ax b X~B(4, 0.5) 
c Gd = 7 + 0 
GLO = #1 + t)3 
EY) = Gy(1 )=70 +1 =2 
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Gi) = 
Gy) = 
Var (X) = 
Var (X) = 


3a +t)? 

31 +1)? =3 

Gi) + GY) - (G 
342-2%=1 

15 a g(11e+ 90? + 709 + 5t* + 305+ 6) 
.. V2555 


xD)? 


- 91 
biz ii 36 


Challenge 
a 0 


bk+1 cy 


Exercise 7D 


5 1 11 1 
Loa s+ q+ at t+ qt? 
b EZ)=G,V=5 +30) + Ba? + 03 = 28 
E(X) = Gy) = 2 
E(Y) = Gy) = s 
EX) + EY) =3+4=2 = EZ) 
2 a X~ Be, 0.5); ¥~ B(2, 0.6) 
1 37 
b 25 ‘i st + Too! + of * tappe 
ce ElZ)=G;(1)= 1 EX) G,(1) = 1, E(Y) = G(1) = $ 
EX) + E(Y)=1+2=U= EZ) 
3 a else, e2-40-) 
b e387) 


e EZ)=G,)=2, EX) =G,=2, EY) =Gy) = 2 


EX) + EY) = 3+ 2-2 = EZ) 
4 a X~ Negative B(1, 4), Ga 
1 1 


=( pt Vz ed 
1-1-pt 3 
1-<t 

6 


Multiply fraction through by 6: Gd) = 


1, \’ — 
b 10 =( re 
ter i 


5 


t 
— 5t 


6 “allt : al 


E(Z) = EX) + E(Y) = G1) + Gi) = 6 + 20 = 26 
5a oa b 7 
Gy = 35 + t + et 
EX} = Gilt) = f+ yan 
GU = 48; 
Varl(X) = Gy) + Gy) - (G\1)) 4 + $+ 2-2) =2 
qd 3.41 
4°74 
At 
6 —— b 8 
* B08 — 20° 
Var(Z) = G21) + Gi(1) - (G.(1)’) = 44 + 8 - (8)? = 2 
7 a (0.7 + 0.30)°(0.6 + 0.40)° 
Lo 5f2t, 3\° (3t, 7 ) 
bee (3 + «(35 +70 
2t . 3)’ (2 7 ) 
(3 e2) “\40;" 16 
2 
EX) = Gy(1) = 2 = 3.5 
8 a geese ob Ga) oe Grae) 
9 a 2 b 2.5 c (Et + 203 + 245) 


an OF 1 
d G0 == rues 
EY) = Gi) = 4 


Gyo = 44+ d+ 20 


Answers 


EX) = Gy) =3 
2E(X) -1=4=E(Y) 
Challenge 
1 Let GW =i) + it 4+ int? + 7,0 + 
So, EY) = Gy(1) = i, + 27, + 3i3 + 
From the question, we have: 
Gt) = OG Mt) = big + Cit + Ci gt?@ + Ubigt™ + 
= dot? + i,t0 + 1,024 + 7,00 +... 
E(Y) = Gy() 
G\(O = bigt?! + (a + Bite’! + (2a + b)i,t?™1 + (Ba + OD) 
igt3aro-l 
= (bigt ea bite de bint201 & bi,78") ae sal 
+ (ait! + 2ait?™| + 3ai,2¢ +...) 
G\(1) = (bin + bi, + bi, + biz +...) + (ai, + 2ai, + 3ai; + ...) 
= Oo +i, +i, +i, +...) + ali, + 2i, + 3i; +...) 
= 0(1) + aE(X) = aE(X) + 6 as required 
0.6t 
28 7-04 


b_ Let the random variable X represent the number of 


shots required to hit the bullseye twice, 
X ~ Negative B(2, 0.6) 


We know that the probability generating function 


for a negative binomial is ee 
1-(1-pjt 
Let H(@® be the pgf of X 
0.6t ) 
1-0.4¢ 


So H® =| = (G(0)? as required. 


c (G(t)* 


Mixed exercise 7 


1 


2 


1 


a7 
biG 
c mm + 2t + 3¢°)(1 + t+ 20?) 
d as Tor 0.1518 (4 dp.) 
X ~ Geo(p) 
We know that the pef of a geometric distribution is 
pt 
Gl) = ———_——_ 
a 1-(1 -p)é 
up - 1) \° 
Gin=——? _-(” (p - 1) 
tp-1j+1 \tp-1)+1 
away atl 
E(x) = G\()) = 1 °) p 
yy 2pulp — 1)? 2plp - 1) 
"(p= 1) +1)? (1) +1)? 
, 2ip- 1)? 2p - 1) 
Gu = - 
Var(X) = GYD) + Gy) - (G0) 
2ip- 1) 2-1) 1 1 
Pp p Pp? 
2-2p p-1 p?-p* 1-p 
p? p? pt p* 


205 


Answers 


3 X~B(5, 0.4) 
PIX = x) = (3) x p* x (1 - p)* 


5 
Gud) = Soe =x) 
0 
5 
= ye x 2) x 0.4% x (1 - 0.4)5* 


a 
= By x (0.40)* x (1 — 0.4)5* 


7 ((°) x (0.409 x (1 - 0.49) + ((?) x (0.40)! 


x(1- 0.4) 8 ((3) x (0.402 x (1 - 0.4)" ae 


Binomial expansion of (a + b)" with a = 0.6 and b = 0.4t 
andn=5 
Hence Gy) = (0.6 + 0.47)° 


4 a X~ Geo(75) 


4 1 4t 
b GLO= t = 
= =f-5}| 15 —1i¢ 
15 
c ik i 12 
2 
d 20t 


(15 - 11012 - 70) 
e 122; 3.6976 (4 d.p.) 


ie) ii 1-(e°5 4 19-05 4 16-05) = 1—-Be-05 
2 8 8 
b_ From the previous part of the question we have: 
Pix= O)=¢** P= =F", 
PUX = 2) = 205; P(X = 3) = 1 - Bes 
Z ie ie 13 

Gy) =e? + 506 + EPL? + (1 — =e |e 
=O +e%5(1+it4 tr - Bp) 
0.498; 0.488 
7; /10 
i 0 it 
Fi t ae i 

(4 — 30°)? (2 — t)(4 - 30)? 


d EZ) =G,(1) 
G, = 


nN 
ase 0 


503 _ 302 
(3¢- 4)°(¢- 2) (3t - 4)" - 2)5 
6t? 
(3¢ - 4)%(¢ - 2)° 
i 5 3 
G,() = - 
e~ (3-42 -2)° (8-420 - 2)5 
eg _ 8 __ 4a 
(3 — 4)3(1 - 2)5 
Gyo) = K(9t8 + 1205 + 4t4 4+ 6t3 + 40? + 1) 
1=9k+12k+4k+6k+4k+k 
1 = 36k hence, k = 


b i 
ec Gyo 


_ 30, St! 

2 3 9 2 9 
BX) = GI =$+3 4945425 
d st + 2° + 349)? 


3 pe 
4t t at 
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Challenge 
a i P(*=0)=G,0)= tan(0) =0 
es = at 
ii G, = tan( : fe 
Use the Maclaurin Expansion of tan(7} at 2 
Ge 4 4 3 
Which gives P(X = 1) = a 


iii Use the Maclaurin expansion from part ii, which 
gives PLY = 2)=0 


ao) 
ae 1 
x(tan| 1 + - 


b EX) =G\)= ri =s 
Var(X) = GY) + GY) — (Gy (1)? 
2tan(= my? 
Aue T tan(#)(tan($) + 1) _ 
< 8 4 
_m? om (my? _ a _ 
Hence, Var(X) 4 + 5 (5) 3 E(x) 
Tw? 
192 
Prior knowledge 8 
1 Ho: w= 10°C; Hy: > 10°C. Accept Hp. 
2 a X2=10;X<1 


b X2=20 


Exercise 8A 


mPOANAUTIAWNH = 


a X26 b 0.0197, 0.9527 
a X<1 b 0.0076, 0.9757 
a {X¥<1}uU{x>9} b 0.0278, 0.9519 
a X211 b 0.0426, 0.9015 
a X=0 b 0.0111, 0.9698 
a {X <3}U{X= 16} b 0.0433, 0.9494 
a X2=15 b 0.0440, 0.5123 
a X> 229 b 0.0100, 0.8989 
a {X < 2} U(X = 369} b 0.0447, 0.8100 
a i Type 1 error: reject Hj when H, true 

ii Type 2 error: accept Hy when H, false 
b {(X < 13} U {X= 748} c 0.1007 
a X25 b 0.048 c X=1 
d 0.05 e 0.9412 f 0.9162 


Exercise 8B 


1 


% > 51.5605... 

0.01 

0.0162, awrt 0.016 

xX < 29.178 

0.05 

0.0869, awrt 0.087 ~ 0.088 

{X < 37.939} U{X > 42.061} 

0.01 

0.5319, awrt 0.53 

x < 14.608 or x > 15.392 

0.1492, awrt 0.1492 

x > 42.4025... 

0.6103, awrt 0.61 

Only way to reduce P(Type IJ) error without 
changing the significance level is to increase the 
sample size. Altering the significance level can 
increase the chances of a Type 1 error occurring. 


anmpaemeaomrneamrmn ans 


Exercise 8C 


1 
2 


a x > 20.9869... 
a 0.0196 


b 0.3757, awrt 0.378 
b 0.0247 
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3 
4 


moan 


0 


a 
0 
a 
b 
c 


a 


b 


eeana oa 


0.0111 

.3522, awrt 0.352 
0.0548 
0.8791 
The test is more powerful for values of p further 
away from p = 0.3 
A Type I error is when Hy is rejected when H, is in 
fact true. 
The size of a significance test is the probability of a 
Type I error occurring. 
0.0569 
X25 
2<X< 368 
0.0490 
0.8714 


b 0.0166 (3 s.f.) 


0.5904 
0.0402 
0.6723 
0.3127 


ne 


Challenge 
a No more than 5 boxes can be inspected 
b 0.4583 


Exercise 8D 


1 


a 
b 


0.0430 
P(x S 2|4) =e% + (e% x A) + 


of e~ to get desired answer 
A= 2-58 = 0.6767 = 0.68 (2 d.p.) 
A=5 => t= 0.1247 = 0.12 (2 dp.) 


-A 2 
etd take out a factor 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


0 > 
0 1 2 3 4 5 64 


Correct conclusion is arrived at when 4 = 6.5, 

H, is accepted. So since size is 0.0430 probability of 
accepting 4 = 6.5 is 0.957 ..4 = 6.5 or for 4 < 6.5, 
correct conclusion is to reject Ho. 

So require where power > 0.5 i.e. A < 2.65 (from 
graph). 


aaa 


Answers 


0.0421 
P(X <2) = a x p? x (1 — p+ eg x pl x 


(1 - py + ey x p® x (1 — p)!° 

= (1 - p)'? + 12p(1 - p)"! + 66p2(1 — p)'° 

0.2528 

0.0547 b 0.6778 

The test is more powerful for values of p further 
away from 0.4 

0.0547 

Test A uses the binomial expansion and the power 
function is part of the binomial expansion for x = 8, 
9,10 

0.0615 

(1 — p)’ (2-1 — p)’) 

p = 0.25 = power, = 0.5256 

p = 0.35 = power, = 0.2616 


Use test A as this is always more powerful. 
0.009 b (1-p)” 
0.01 b p c p? 


The test with 10 trials has larger power. 


Mixed exercise 8 


wh= 


a 


Scperacea a & 


X29 b 0.0422 c 0.3036 
X=0 b 0.0302 c 0.0498 
X < 6.614... or X > 9.3859... b 0.05 
awrt 0.707 ~ 0.708 

0.293 ~ 0.292 

0.0001318 b 0.3566545 

0.0866 

i r=1- 0.9489 = 0.0511 


s=1- 0.7440 = 0.2560 
t= 1 —- 0.3239 = 0.6761 
ii Power 


1 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


0 > 
0 12 3 4 5 67 8 9 104 
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Answers 


6 a 0.0424 


b Power = P(X < 3|X~ B(15, p)) 


c 


vl 
2 


Ss epmonkomeon 
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p= 0.2 = s = 0.6482 
p= 0.4 = t = 0.0905 
Power 


ut 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


0 
0 0.1 


Hy: A = 2 

(Quality the same) 
Critical region X > 5 
0.3712 

Hy: A = 2 

Critical region X > 5 
0.1847 

Critical region X > 11 
0.294 


0.2 0.3 


Hy: A>2 
(Quality is poorer) 


0.4 0.5 A 


Hy: A>2 


Second test is more powerful as it uses more days. 


0.0620 


ev x yu? 
P(x < 2\u) =e + (0 x p) + 


take out a 


factor of te? to get desired answer 


S = 0.6767 
t= 0.1247 


Power & 


10 


11 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


oman ea o 


a 


0.15 0.2 0.25 0.3 0.35 0.4 P 


i 0.325 

ii With p greater than this value, the technician’s 
test is stronger than the supervisor’s. 

Test is more powerful for probabilities closer to 

zero, quicker to test 5 than to test 10 

0.0839 

0.38 (2 d.p.) 

X>=9 

0.0403 

0.91 (2 dp.) 
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f A 


a 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 


0.1 
0.4 
g i 0.63 
ii With A greater than this value, the manager’s 
test is more powerful. 


Power 


~V¥ 


0.5 0.6 0.7 0.8 0.9 1 


Challenge 
a 0.0423 
b 1-(1-p)”- 12p(1 - p)" 
c 0.0173 
d 1-(1-p)"-12p(1 - p)'! from part b, with qg = 6p, 
substitute p = Z to get thee answer 
e 0.0087 + 14.8695q*- 23.7912q° + 9.9134q° 
f 
=e Jane'stte: 
EEEEEEECEEE EEE AoA a Sees tet 
> 
0.05-0/4 0-15 0/2 0.25-0;3-0.35- 04 
g The power of Jane’s test is greater than that of Emma’s 


when 0.1 < q < 0.4 


Review exercise 2 


1 


a p=0.1 

b r=28.5 (1d.p.), s = 100-91 = 9.0 (1d.p.) 

c t.s.>c.v. so reject Hy. 
(significant result) binomial distribution is not a 
suitable model 

d_ Defective items do not occur independently or not 
with constant probability. 

a_ B(5, 0.5) 

b_ Insufficient evidence to reject Hy. 
B(5, 0.5) is a suitable model. 
No evidence that coins are biased. 


r = 10.74, s = 30.20, t = 3.28 
c Hp: B(10, 0.2) is a suitable model for these data. 
H,: B(10, 0.2) is not a suitable model for these data. 
d_ Since ¢ < 5, the last two groups are combined and 
v=5-1=4. Since there are then 5 cells and the 
parameter p is given. 


10 


11 


12 
13 


14 


15 


16 


17 


Answers 


e 4.17 < 9.488 so not significant or do not reject null 
hypothesis. 
The binomial distribution with p = 0.2 is a suitable 
model for the number of cuttings that do not grow. 
Critical value y37(5%) = 5.991 
from Poisson 5.47 is not in the critical region so accept 
Ho. Number of computer failures per day can be 
modelled by a Poisson distribution. 
a_ Reject Ho. 
Conclude there is evidence of an association 
between Mathematics and English grades. 
b May have some expected frequencies <5 (and 
hence need to pool rows/columns). 
3.841 > 2.59. There is insufficient evidence to reject Hp. 
There is no association between a person’s gender and 
their acceptance of the offer of a flu jab. 
14.19 > 9.210 so significant result or reject null 
hypothesis. 
There is evidence of an association between course 
taken and gender. 
3.47619 < 9.488 
There is no evidence of association between treatment 
and length of survival. 
2.4446 < 5.991 
so insufficient evidence to reject Hp. 
No association between age and colour preference. 
3.2 < 15.1 therefore no evidence to suggest it is not 
uniformly distributed 
a 24.8, 14.88, 8.928, 5.3568, 8.0352 
b_ H,: X ~ Geo(0.4) is a suitable model; 
H,: X ~ Geo(0.4) is not a suitable model. 
ce 12.5047 < 13.3 therefore no evidence to suggest 
the model is not suitable. 
d Reject null hypothesis since critical value < test 
statistic. 


1 1 
a 36 b 3 
0.24t 
24 0462 
aes ® ear © T-0.76t 
0.76 + 0.244)" (2) 
d ( + t) e 1-076 
= = el? x 1.7% 
PLY = x) or” ca 
G,(t) = } P(X = x)t* 
_ vet’ x 1.7%, 
a an 
oo apel 7 
=e PS ae 
2 3 
=erht 141.764 178 , (L.78) +. 
2! 3! 
=e-l@l7 
= ele) 
a 1; 1.225 (4s.f.) 
2 4 os 8 
b ia ii = , 
aig b 3 
c EX) = Gi) = 327+ 124+1)=5 


Var(X) = GY(1) + Gi(1) - (Gi (1)? = 3 

d 36.5 

a G(I)=1>k(1+44+2P%=49k=1lSkay 

b a5 

e ER) =G,)=%x 12+4+16+8+1) = 2 
Var(X) = #8 

d t2(8 4406420) 
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Answers 


18 a 


19 a 


20 a 


21a 


22 a 


210 


i Type I - H, rejected when it is true 

ii Type II — H, is accepted when it is false 
Hj):A=5 H,:A>5 

P(X = 7|A = 5) = 1- 0.7622 = 0.2378 > 0.05 
No evidence of an increase in the number of chicks 
reared per year. 

P(X > clA=5) < 0.05 

P(X = 9) = 0.0681, PLY = 10) = 0.0318, c = 10 
P(Type I error) = 0.0318 

A=8 

P(X S 9| A = 8) = 0.7166 


X= 250 
2 
v15 

or X < 247.6... 


X- 250 


> 2.3263 or < -2.3263 


X > 252.40... 


(PX < 252.4| pw = 254)- P(X < 247.6|y = 254) 
_ Z eS *) i < 247.6 - ”) 


4 
v15 


v5 
= P(Z < -1.5492) - P(Z < -6.20) 
= (1 - 0.9394) - (1-1) 
= 0.0606 


P(X < c,) < 0.05; P(X < 3|A = 8) = 0.0424 > X¥ <3 
P(X = c,) < 0.05; P(X = 14| A = 8) = 0.0342 

P(X = 13|4 = 8) = 0.0638 > X= 13 

.. critical region is {X < 3} U {X¥ = 13} 


i P(4<X<12|A=10) = P(X < 12) - P(X S 3) 
= 0.7916 — 0.0103 
= 0.7813 

ii Power = 1 - 0.7813 = 0.2187 


1- 0.8891 = 0.1109 

Power of test = 1 — P(X < 2| X ~ B (12, p)) 
=1-P(X¥=0) + P= 1) + P(X¥= 2) 
=1-(1-p)+12p(1 — p)'! + 66p?(1 — p)!° 
=1-(1-p)!° + 10p + 55p?) 

1-0.5583 = 0.442 

1- 0.00281 = 0.997 

The test is more discriminating (powerful) for the 

larger value of p. 


i The size of a test is the probability of rejecting 
the null hypothesis when it is in fact true and 
this is equal to the probability of a Type I error. 

ii The power of a test is the probability of rejecting 
the null hypothesis when it is not true. 
Power = 1 — P(Type II error) = P(being in the 
critical region when H) is false) 

X ~ B(8, 0.25) 

Size P(X > 6) = 1- P(X < 6|n = 8, p = 0.25) 

= 1- 0.9996 = 0.0004 
Power of test = P(X > 6| X ~ B(8, p)) 

= P(X = 8) + P(X =7) 

= p* + 8p"(1 —p) 

= 8p’ - 7p 

Power = 8(0.3”) — 7(0.3%) = 0.00129 

Power = 1 - P(Type ID 

P(Type ID = 1 —- power 

= 1-0.00129 
= 0.9087... 

Increase the probability of a Type I error, 

e.g. increase the significance level of the test. 

Increase the value of p. 


23 


24 


25 


0.3311 
Power of test = P(Y S 2|X ~ Po (A) 
(1+ P(3 <X < 4|X ~ Po (A) 
Leading to a power function of 
aa +tAt Fle + # + al 
6 24 
0.5891 
H):p=0.35 H,:p 40.35 
Let X = Number cured then X ~ B(20, 0.35) 
a = P(Type I error) = P(w < 3) + P(x = 11) given 
p=0.35 
= 0.0444 + 0.0532 
= 0.0976 

G = P(Type I error) = P(4 = x < 10) 
p 0.2 0.3 0.4 0.5 
B 0.5880 0.8758 0.8565 0.5868 
Power = 1-2 
0.4120 0.1435 
Not a good procedure. 
Better further away from 0.35 or 
this is not a very powerful test (power = 1 - 3) 
0.0226 
Power = 1 — P(O) — P(1) 

=1-(1-p)’- 5p - p)* 

=1-(1-p)'0 + 4p) 
0.0115 
a=0.47 


test 


’sitest 


0 
0 0.05 0.1 0.15 0.2 0.25 0.3 P 


The assistant’s test as at p > 0.2 that test is more 
powerful. 

e.g. The manager believes the actual probability 
is close to 0.05, or that it would be more time or 
resource consuming to take larger samples. 


Challenge 


1 


2 


a 
b 


a 


r=19.15,s=19.15 

12.12 < 15.086 so accept Hp. 

The distribution can be modelled by a 
N ~ (360, 20). 

i E(S) = G,(1) 


Gus E(GuGx(0) = GG W)C 
G5 (1) = Gy(G,(1))Gy() = Gy (1) Gy) = EWM) ECO 
ii Var(S) = Gi(1) + Ga(1) - (Ga (1)? 
= Gi(1) + E(N) EX) - (M2 EO? 
Cis CaO) Gi(Gx(0)) 
=Gy(t) G() GYG,(0) + GO Gy(Gy (0) 
Gi(1) = E(X)2Gy (1) + G1) EO) 


' Online ) Full worked solutions are available in SolutionBank. oe 


b 


Var(S) = E(X)?Gy(1) + Gy (1) EW) + EV) EX) 
—-E(N)? E(x)? 
= E(N)(GY(1) + E(X)) + ECX)*(G) (1) — EWV)?) 
= E(N)(Gy (1) + EX) + ECX)(Gy (1) — EQ)*) 
—E(N) E(X)? + E(N) E(x)? 
= EWN)(G() + E(X) - EX?) 
+E(X)*(Gi\() + EW) - EWV)?) 
= E(V)Var(X) + E(X)?Var(V) 
N ~ Po(A), so E(V) = Var(V) = 4, 
X ~ BC, p), EX) = p, Var(X) = p(1 — p). 
So E(S) = E(V) x E(Y) = Ap 
Var(S) = E(V)Var(X) + E(X)?Var(V) 
= Ap(1 — p) + p?A = Ap — Ap? + Ap? = Ap 
Trials are independent and occur randomly over a 
fixed interval. 
E(S) = Var(S), so S ~ Po(’p) 
N ~ Bin, q) EW) = ng Var(N) = ng(1 - q) 
E(X) = p, Var(X) = p(1 — p) 
E(S) = npq Var(S) = npq(1 — pq) 
Trails are independent, fixed number of trials, 
probability of success constant. 
S ~ Bin, pq) 
i mean = £1575 o = £227 (nearest £) 
ii mean = £420 o = £108 (nearest £) 


Exam-style practice: AS level 


1 
2 
3 


a 
a 
a 


oa of 


0.12; 0.43 b 1.8924 c 0.78 

0.0424 b 0.5940 ce 0.5520 

H,: There is no association between sport and 
gender. 

H,: There is an association between sport and 
gender. 

6.150 c 2 

Fail to reject Hy — the critical value for X? is 7.378 
which is > test statistic, therefore not significant. 
Reject Hy since new critical value is less than the 
test statistic (5.99). 

3.75; 3.73125 b 0.5162 

nis large and p is small, thus mean ~ variance 
0.565 

Hy: Binomial is a suitable model; H,: Binomial is not 
a suitable model 

X? test statistic = 4.89 (2 d.p.), critical value = 6.25 
(3 degrees of freedom) 

Fail to reject Hy; there is evidence to suggest that 
the binomial model is suitable. 


Answers 


Exam-style practice: A level 

Hy: Geo(0.4) is a good model; H,: Geo(0.4) is not a good 
model 

Test statistic = 14.87, critical value = 9.488 so reject 
null hypothesis; Geo(0.4) is not a good model 


1 


a 
b 
c 


i") 


mao 


G1) =1 > k(1+24+3)3=15k=s, 
a 


72 
G;,(1) = 4, Gy(1) = 4 
Var(X) = Gi(1) + Gy(1) - GC)? 


41 ee) 
=31+4-16=3 


= B 2 2)2\)3 — a 2 4)3 
Gy= a6 it + le) + 300%) = a6 t+ 2t +304) 
Y ~ Negative binomial(r, p); Throws are 
independent and probability is constant. 
0.35 c 7 
0.0255 
Probability (0.0780) is greater than significance 
level so not enough evidence at the 5% significance 
level to say number of flaws has been reduced. 
0.3, 0.2, 0.05 b 0.3 
15; 14.55 
Number of penalty kicks is large; probability of 
missing is small. 
0.2511 
Probability using binomial = 0.2485 which agrees 
to 2 s.f. therefore accurate. 
X < 1; 0.0629 
Power of test = P(X = 0) + P(X = 1) 

= (1 -p)® + 25p(1 — p)*4 

= (1 -p)*(1 + 24p) 
0.1122; (1 — p) 
Philip’s test has a smaller size therefore better. 
Philip’s test has a greater power therefore better 
(Philip = 0.3286, Gemma = 0.3225). 
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actual significance level 147 
alternative hypotheses 59, 66, 92 


binomial distribution 

degrees of freedom 105 

expected value/mean 32-3 

negative see negative binomial 
distribution 

parameter 67 

Poisson distribution as 
approximation to 34—5, 60 

probability generating 
function 132 

probability mass function 32 

testing as model 105-8 

variance 32-3 


cells 96 
central limit theorem 76—84 
applied to normal 
distribution 77-8 
applying to other 
distributions 80-1 
definition 77 
chi-squared distribution 97-102 
critical regions 99 
critical value 98, 99 
degrees of freedom 97 
hypothesis testing 99-102 
constant average rate 24 
constraints 96 
contingency tables 113-16 
degrees of freedom 114-15 
expected frequency 114 
selecting model 114 
setting hypotheses 114 
continuous distributions 154 
critical regions 
chi-squared distribution 99 
geometric distribution 69-71 
Poisson distribution 62—4 
critical values 62, 98, 99 
cumulative distribution 
functions 21-2 
geometric 44—5 
Poisson 59 
cumulative distribution tables 21 


degrees of freedom 96-7 
binomial distribution 105 
chi-squared distribution 97 
contingency tables 114-15 
discrete uniform distribution 104 
Poisson distribution 109 

dice rolling 44, 49-50, 92-3 

discrete data, testing goodness of fit 

with 103-10 


discrete random variables 1-18 
expected value 2-3 
probability generating 
function 129-30 
solving problems involving 11-13 
variance 5—6 
discrete uniform distribution 
degrees of freedom 104 
testing as model 104—5 
distributions 


binomial see binomial distribution 


chi-squared see chi-squared 
distribution 

continuous 154 

geometric see geometric 
distribution 

normal see normal distribution 

Poisson see Poisson distribution 


expected value (mean) 
binomial distribution 32-3 
discrete random variable 2-3 
finding by differentiating 
p.g.f. 135-6 
of function of X 7-10 
geometric distribution 47 
negative binomial 
distribution 52-3 
Poisson distribution 30, 59-60 
of X? 3 
exponential function, as infinite 
series expansion 20 


geometric distribution 43-7 
central limit theorem applied 
to 62 
critical regions 69-71 
cumulative 44—5 
expected value/mean 47 
goodness of fit tests 119-20 
hypothesis testing 66-8, 69-71 
parameter 66-8 
probability function 44 
probability generating 
function 133 
variance 47 
goodness of fit 92—4 
geometric distributions 119-20 
testing with discrete data 103-10 


hypothesis formation 92—4 
hypothesis testing 58—75 
alternative hypotheses 59, 66, 92 
chi-squared distribution 99-102 
comparing tests 164 
geometric distribution 66-8, 
69-71 


null hypotheses 59, 66, 92 

one-tailed tests 59, 62,99 

Poisson distribution 59-60, 62—4 

power function 162-5 

power of test 157-60 

quality of tests 146-72 

size of test 157-60 

two-tailed tests 60, 62, 64 

types of error see Type I errors; 
Type I errors 


Maclaurin expansion 20 
mean see expected value modelling, 
with Poisson distribution 23—4 


negative binomial distribution 
49-53 
central limit theorem applied 
to 80, 81 
expected value/mean 52-3 
and number of trials needed 49 
probability function 50 
probability generating 
function 133-4 
variance 52-3 
normal distribution 
central limit theorem applied 
to 59-60 
finding Type I and Type IJ errors 
using 153-6 
sample mean approximately 
follows 77-78 
null hypotheses 59, 66, 92 


observed frequencies 96 
one-tailed tests 59, 62, 99 


parameters 
binomial distribution 67 
geometric distribution 66-8 
Poisson distribution 20, 23, 67 
p.g.f. see probability generating 
functions 
Poisson distribution 19—42 
adding 27-8 
as approximation to binomial 
distribution 34—5, 60 
central limit theorem applied 
to 80 
critical regions 62—4 
cumulative 59 
degrees of freedom 109 
expected value/mean 30, 59-60 
hypothesis testing 59-60, 62—4 
modelling with 23—4 
parameter 20, 23, 67 
probability generating 
function 132-3 
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testing as model 108-10 
variance 30 
power function 162-5 
power of test 157-60 
probability distributions see 
distributions 
probability generating functions 
(p.g.f.) 128-45 
binomial distribution 132 
differentiating to find mean and 
variance 135-6 
discrete random variables 129-30 
as expectation of function of 
random variable 129 
geometric distribution 133 
G,(1) = 1 property 129 
negative binomial 
distribution 133—4 
Poisson distribution 132-3 
sums of independent random 
variables 139-40 
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random variables 2 
discrete see discrete random 
variables 
restrictions 96 


sample mean 76-7 
as normally distributed 77-8 
sample variance 153 
significance levels 60, 63, 99 
actual 147 
target 154 
size of test 157-60 
standard deviation 5, 47 


target significance level 154 
tests 
one-tailed 59, 62, 99 
two-tailed 60, 62, 64 
see also goodness of fit; hypothesis 
testing 
trials, number needed 49 
two-tailed tests 60, 62, 64 


Type Terrors 147-S1 
finding using normal 
distribution 153-6 
probability of 157-60 
relationship with Type I errors 
155-6 
Type II errors 147-S1 
finding using normal 
distribution 153-6 
probability of 157-60 
relationship with Type I 
errors 155-6 


variance 
binomial distribution 32-3 
discrete random variable 5—6 
finding by differentiating p.g.f. 136 
of function of X 7-9 
geometric distribution 47 
negative binomial distribution 52-3 
Poisson distribution 30 
sample 153 


